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we —_—«sf you are not using the latest breakthrough in dividing the clock off-chip. And the new RealDigital 


CPLD technology, you are not getting the most out of your CPLD. Only the CPLDs offer the smallest packaging with the highest level 
new 1.8V CoolRunner-II RealDigital CPLD from Xilinx, offering a 100% digital of design security. 


core, gives you the performance, low power, and features you are looking for 


,; Get the new CoolRunner-II Design Kit 
with no price premium. 


Get started now with the new CoolRunner-II Design Kit, 
The best system performance in the industry including a populated board, programming cable, design guide, and software 


The CoolRunner-II RealDigital CPLD features uuu ae = = , | resource CDs... all for under fifty dollars! 
1.8V CORE VOLTAGE CPLD COMPETITIVE CHART ics Saeed 
the most I/O per macrocell, and advanced Just contact your local Xilinx distributor or 
Manufacturer Xilinx Lattice Altera 


I/O interfacing including HSTL and SSTL. visit www.xilinx.com/coolrunner2 


Device Family ete Mem ispiMACH4000C None 
System performance exceeds 400 MHz Standby Current <100 pA 2 mA N/A to order your kit today. And don’t forget 


with the lowest dynamic current and lowest Clock Divider YES NO | N/A the new CoolRunner-II RealDigital CPLDs 
Clock Doubler YES NO N/A 


/O Standards Support JMAMIMERUH OSM LVTTL,LvcMos | = N/A : 
lower compared to all other 1.8V devices!) HSTL, SSTL ISE WebPACK" tool, which you can 


e unique Clock Divider means no more _ | /0 Banks (max) | | 2 N/A download FREE right now. 
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LETTER FROM THE EDITOR 


Why You Should Read Xce 


P rogrammable logic is now the overwhelming choice for all types of designs, from low-cost 
consumer devices to high-performance switching systems. When you consider the many 
advantages of programmable logic, you'll see that it’s not only the fastest, easiest, and most 
flexible way to develop new products, it’s also the lowest cost and the most reliable solution in 


most applications. There simply is no better way to develop new products. 


The Xcell Journal will help you bring your imagination into reality through programmable logic 
technology. Xcel/ is written by engineers who understand the challenges you face every day, and 
we strive to bring you the latest information about the products and services from Xilinx and our 
entire Partner Ecosystem, so you can make the best and most informed choices. We show you 


how to save time, effort, and money, while creating better, more profitable products. 


Xilinx is the undisputed world leader in programmable logic. We invented the Field Programmable 
Gate Array (FPGA) back in 1984, and since that time we have continually created faster, cheaper, 
more capable products with each new generation. In 1994, a basic 25K-gate XC4025 FPGA, built 
on 0.6p technology, cost you $654. This year the price of our smallest Spartan™-HE device with 
twice as many system gates is just $6.00. Plus, this year we'll be the first in the industry to introduce 
90 nm technology to even further reduce costs. As you can see, programmable logic has matured 


rapidly, and that’s why its use is growing faster than any other market segment. 


Our Partner Ecosystem is also an important part of what makes Xilinx technology so attractive. 
Our partners are the best in the business, and they provide a wide range of development tools, 
intellectual property (IP), design services, and support products that help make your job easier 
while making your products more successful. Together, we are changing the future of logic 


design, bringing you advanced design solutions, helping you create new realities. 


The Xcell Journal is your best source for information about this exciting technology. 


Carlis Collins 
Editor-in-Chief 


To subscribe to the printed Xcell Journal, 


or to view the Web-based Xcell Online Journal, visit: 


www. xilinx.com/publications/xcellonline/ 
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The Xilinx Serial Tsunami Initiative is a comprehensive 
set of programmable serial 1/0 solutions and 

system strategies that will help you simplify designs, 
increase pertormance, and lower system costs. 





Working with the Best Use SpyGlas Predictive Analysis 


Xilinx now ranks number four on —! — for Ettective RIL Coding 
FORTUNE magazine's list of the ; 

“100 Best Companies to Work For.” | 
Here’s why it matters to you. 





Atrenta Inc.’s SpyGlass sottware uses 
Q “look-ahead” engine based on 
fast-synthesis technology to help you 
identity potential problems — and tix 
them — early in the design process. 


ian 


oF KIN © 


2003, ISSUE 45 


Serial Tsunami Requires 
Changes to SI Veritication 
and Planning 


Mentor Graphics’ HyperLynx 
products are designed to facilitate 
the transition from parallel to 


serial connectivity. 1 7 


Reduce Your RISC 
with a PicoBlaze 
Reference Design 


Build your own soft 
microcontroller in a 
CoolRunner-ll CPLD. 


It’s easy, fast, and free. ) 7 


Get RealFast RTOS 


with Xilinx FPGAs 


Realtime operating systems 
implemented in Xilinx FPGAs 
enhance performance, improve 
predictability, simplify design, 


and lower system costs. 5 Q 


How to Make Smart 
Antenna Arrays 


The Nallatech BenADIC card combines 

a 20-channel data acquisition system with 
Xilinx XtremeDSP technology and Virtex 
FPGAs for high-performance digital 


signal processing. 7 6 





XCel olulgatel 


Workingiaiiinichbccitt,....... Seana ammmmi <a 6 
Ride the Crest of the Serial TSUNOMI ...........sccscssseccssseccesseccesees 9 
Surf the Serial Wave to SUCCESS ........sscccssseecsssecesssecesssecessees 14 
Serial Tsunami Verification and Planning ...........0..0...cse eee 17 
Could Microprocessor Obsolescence Be History? ..................... 21 
Reduce Your RISC — PicoBlaze Reference Design.................... 24 
Xilinx FPCs Target Cost-Sensitive Applications... 29 
Design Embedded Programmable Systems...............sssseess 34 
Integrated FPGA and Microprocessor Solutions..............c:.ss. 36 
Versatile MicroEngine Simplifies Embedded Designs................. 4] 
Push the DSP Performance Envelope.............scesssseessesseesseeeee 4A 
ISE 5.2i Further Reduces Your Design Costs ..............csseese. 48 
Prototype Xilinx Devices with MultiPRO Desktop Tool .............. 50 
DLK Enables Cost-Effective Design ..........c:sscssseesscecsseeseesseeen 52 
Spyglass Predictive Analysis tor Effective RTL Coding ............... 54 
Get RealFast RTOS with Xilinx FPGAS 00... .ssesssseccssseccsseeeen 58 
Virtex-ll Pro FPGAs Deliver Proven Interoperability ................... 6] 
Ser I ENPOMAIIY, GIOWS.......... Mei Nis...-Mcasencpnnes- anne 66 
CoolRunner-ll Solutions Save Money ...........seeseseeseeseeseeeoees 69 
Reinventing the Signal ProcessOr ...........c...cessessseeseesseesseeen ry 
How to Make Smart Antenna Arrays .........eseesecsessesseeseeeeee 16 
FPGAs — Multiprocessing 1/0 Intrastructure for 3G................ 80 
Bluetooth Wireless Technology BOOST Lite Processor............... 86 
Decode MPEG-2 Video with Virtex FPGAS..............sscscssecsseees ) 
Buahunters @ Siemens...........0... 9] 
support.xilinx.com: The Answers You Ne@d............:..cesseesseeseee 94 
Fearn Smarter, Faster ...gee 96 


Xilinx Technology Enabled Deployment of Rea-PCI Express............ 98 
Xilinx Events and Tradeshows..........c..scssseessesssesssesssesseesseeeees hy 
aoe a 100 


Geo G i, i, ee 10] 





from the top 


Working with the Best 


Xilinx now ranks number four on 
FORTUNE magazine's list of the 
“100 Best Companies to Work For.” 
Here’s why it matters to you. 






Spring 2003 


Xilinx is a certainly 
great company to 
work for, as 
acknowledged by 
FORTUNE maga- 
zine’s ranking, and 


we afea great com- 





pany to work with, 


because we have 


by Wim Roelandts 
CEO, Xilinx Inc. 


nor decreased our customer support activi- 


not slowed our fast 


pace of innovation 


ties — all critical to the success of our cus- 
tomers. We weathered the recent economic 
downturn in ways that have made Xilinx 
stronger across the board. How did we keep 
our technology advancing at a high rate, 
continue to expand our support, and 
gain market share, in a downward spiraling 
economy? Here’s how. 

A large part of our business 
came from the telecommunica- 
tions industry, which was 
severely hit by the downturn. 
Our revenues were cut almost 
in half, overnight. Our com- 
petitors were affected just as 
severely and as a result, many 
were forced to lay off a sizeable 
part of their workforce and to 
reduce both their product 
development and their cus- 
tomer support activities. We 
chose to make layoffs only as a 
last resort, and yet we needed to 
cut our expenses dramatically. 

We chose to take pay cuts, 
on a sliding scale, instead of 
doing layoffs. The average pay 
cut was 6%, rising to 20% for 
myself. Everyone shared the 
burden according to their abili- 
ty, and everyone was very 
happy to have some job securi- 
ty in a time when many of our colleagues in 
other companies were losing their jobs. 
Not only did our morale remain high, but 
our productivity increased as well, and we 
continued to produce our new technolo- 
gies even faster than before. 

Studies have shown that companies that 
can avoid layoffs rebound more quickly 


when the economy turns around, because 
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they can take advantage of every new oppor- 
tunity with a full staff. Companies that must 
layoff their workers suffer from lower 
morale, less productivity, and a slower return 
to profitability. That’s what we are seeing 
now: Xilinx is gaining market share, while 


our competitors are struggling to keep up. 


What We’ve Accomplished 

The technology development that we are 
doing today will not show up in your 
hands for several years. That’s why it is 
imperative that we keep our technology 
advancing, even when current revenues are 
slumping. Otherwise, we would find that 
when the economy turns around, we would 
not be ready to support your demand. Here 


are some of the important advances we are 


now introducing. 


a 2 





90 nm Low-Cost Fabrication Technology 
Using IBM’s most advanced, copper-based, 
90 nm semiconductor manufacturing 
process technology, IBM and Xilinx are 
manufacturing a new FPGA design in 
IBM’s new 300 mm chip fabrication facili- 
ty. This technology is a major reason why 
our FPGAs will continue to lead the indus- 


try in cost reduction. 


I Or 


This new process technology has resulted 
in a 5% to 80% percent chip-size reduction 
compared to any competing FPGA. IBM 
plans to manufacture this new product, in 
high volumes, in the second half of 2003. 
The new IBM $2.5 billion, 300 mm chip- 
making facility combines — for the first time 
anywhere — IBM chip-making break- 
throughs such as copper interconnects, sili- 
con-on-insulator (SOJT), and low-k dielectric 
insulation on 300 mm wafers. 

Our investment in 90 nm manufactur- 
ing technology will enable us to drive pric- 
$25 for a 


one-million-gate FPGA, which represents a 


ing down to under 
savings of 35% to 70% compared to any 
competitive offering. Such a significant 
reduction in pricing is possible due to the 
remarkable economies of scale involved 
with moving to next-generation 
manufacturing processes at 
increasingly finer geometries. 
Now we can achieve greater 
device densities and higher 
yields, making our FPGAs the 
logical alternative to ASICs. 
The rising costs of develop- 
ing ASICs on more advanced 
processes are well known. Not 
only are non-recurring engi- 
neering (NREs) charges rising 
to over one million dollars per 
design, but the engineering cost 
of developing and testing a 
complex ASIC on advanced 
processes such as 150 nm or 
130 nm technology can run up 
to 10 times that amount. With 
the deployment of our new 
FPGAs on 90 nm technology, 
Xilinx has resolved all of the 
deep sub-micron design chal- 
lenges for you. Using these new 
FPGAs, you will get all of the 
substantial cost advantages of the 90 nm 
technology without being forced to worry 
about the detailed circuit design issues 
associated with ASICs. You can concentrate 
on getting your system designed rather 
than on getting the chip itself to function. 
With our increased density, performance, 
and system features, you get all the benefits 


of an ASIC in a flexible, programmable 
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FPGA, without the risk and the huge NRE 
costs. This is dramatically expanding the 
total market for FPGAs, both in new appli- 
cations in existing markets and in totally 


new markets. 


Serial Tsunami 

We have pioneered the development of very 
high-speed serial I/O technology — we call it 
the Serial Tsunami Initiative. This new 
technology will solve many design chal- 
lenges, allowing you to replace old parallel 
busses with a far less expensive solution. 
With Serial Tsunami, you can significantly 
reduce costs, produce faster designs, reduce 
your PC board area, and create products 
that were never possible before. 

Other advantages of Serial Tsunami 
include reduced EMI, noise, cross talk, and 
skew, which makes your overall design more 
reliable. You can easily expand the I/O for 
increased bandwidth, and the physical inter- 
face can drive long signal traces on your PC 
boards, making them ideal for backplanes. 

The move to serial I/O technology is 
inevitable — there is no better way of cut- 
ting costs while keeping pace with current 
and future bandwidth requirements. Our 
Virtex-II Pro™ FPGAs with embedded 
RocketIO™ 3.125 Gbps transceivers, and 
the accompanying IP cores, reference 
designs, and support infrastructure, pro- 
vide the best possible way for you to realize 
all the advantages of serial I/O technology 
without the pitfalls. 

Virtex-II Pro FPGAs support major 
emerging serial interfaces such as PCI 
Express™, Gb Ethernet PHY, 10 Gb 
Ethernet XAUI, Fibre Channel, OC-48, 
OC-192 and OC-768 SONET for 
backplanes, Serial RapidIO™, and 
InfiniBand™,. Each embedded RocketIO 
transceiver in the Virtex-II Pro FPGAs is 
based on several generations of customer- 
proven Mindspeed SkyRail™ technology 
and can run from 622 Mbps to 3.125 
Gbps; there are up to 24 of these trans- 
ceivers available in one FPGA. 

Virtex-II Pro FPGAs also support par- 
allel interface standards such as SPI-3 
(POS PHY Level 3), SPI-4.1 (Flexbus 4), 
SPI-4.2 (POS PHY™ Level 4), 10 Gb 
Ethernet Media Independent Interface 
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(XGMIJ), RapidIO, PCI, PCI-X, CSIX, 
HyperlTransport™, XSBI, and SFI-4. 
Therefore, Virtex-II Pro FPGAs are the 
only devices in production today that 
enable you to bridge between the parallel 
and serial interfaces, making them the ulti- 
mate connectivity platform. 

Xilinx and partners are delivering pre- 
engineered IP cores for serial connectivity 
protocols. Cores for Gb Ethernet MAC 
with PHY, 10 Gb Ethernet MAC with 
XAUI, PCI Express, Fibre Channel, and 
reference designs for SONET OC-48 


backplanes are available now, and more are 


T OP 


for the RocketIO transceivers for flawless 
system design. You will know exactly 
what to expect for your specific design 
situation, plus we provide best design 
practices and PCB layout guidelines to 
help you succeed. 

We have also developed a new, open, 
scalable, lightweight serial standard called 
Aurora to help you transition from parallel 
to serial interfaces. It is a link layer 
protocol that can encapsulate and trans- 
port any higher-level protocol. It is very 
with low 


resource-efficient latency. 


A single-lane reference design and the 





being added. You can also build higher- 
level protocols using our embedded IBM 
PowerPC™ processors. Reference designs 
and evaluation/prototype boards help you 
verify the performance of the transceivers 
in real hardware. 

Creating designs with speeds of 622 
MHz, 3.125 GHz, 10 GHz, and beyond 
will present challenges with PCB design and 
signal integrity. Therefore, Xilinx and lead- 
ing EDA partners are solving this dilemma 
with tools such as the Cadence 
SPECCTRAQuest™ and HSPICE models. 


We provide in-depth characterization data 


specification are available for free down- 
load at www.xilinx.com/aurora/, A quad- 
link reference design will be available 


during the first half of 2003. 


And Much More 

I've only mentioned a few of our most 
important technologies. As you can see, the 
current downturn has not slowed our inno- 
vation, and we are fully ready to support 
your design requirements, both now and in 
the future. We are the fourth best company 
to work for, and the number one company 
to work with. & 
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by Anil Telikepalli 

Marketing Manager, Virtex Solutions 
Xilinx, Inc. 

anil. felikepalli@xilinx.com 


Remember the days when serial I/O brought 
either USB or IEEE 1394 to mind? Not any 
more. A veritable tsunami of new and evolv- 
ing serial I/O standards is washing over the 
technological landscape, delivering promises 
of higher performance, lower costs, and sim- 
pler designs. Remarkable advances in semi- 
conductor technology and the availability of 
low-power CMOS serial transceivers are 
driving a migration of tidal wave propor- 
tions from parallel to serial interfaces. 
Xilinx is at the forefront of this move- 
ment. We have been shipping our flagship 
Virtex-II Pro™ FPGAs (www.xilinx.com/ 
virtex2pro) since the beginning of 2002. The 
Virtex-II Pro devices are the only Platform 
FPGAs with embedded 3.125 Gbps 
RocketIO™ CMOS serial transceivers. 
Designing cutting edge serial I/O technolo- 
gies is a challenging endeavor, but using seri- 
al I/Os to build your systems need not be. 


Broad Trend Toward Serial Connectivity 

Experts agree that both single-ended and 
differential parallel I/Os have reached 
their physical limitations and cannot pro- 
vide a reliable and cost-effective means for 
data rates greater than 1 Gbps. Serial I/O 
provides performance benefits to high- 


speed systems and cost benefits to low- 


Figure 1 - Virtex-II Pro Platform FPGA 


Serial system interfaces such as PCI 
Express™, Serial RapidIO™,  Infini- 
Band™, 1 Gb Ethernet, 10 Gb Ethernet 
XAUI (10 gigabit attachment unit inter- 
face), Fibre Channel, Serial ATA, SxI-5, and 
TFI-5 are all available today, with many 


more coming to address specific needs. 


Serial Tsunami Initiative 

Xilinx launched the Serial Tsunami 
Initiative (www.xilinx.com/connectivity) to 
sail the crest of the industry move from 
parallel to serial interfaces, to reduce 
costs, and to keep pace with current and 


future bandwidth requirements. The ini- 





Serial Tsunami Initiative help you surf the 
surges of multiple, evolving serial stan- 
dards all the way across devices, IP, and 
software as you build your systems. 

The foundation of the Serial Tsunami 
Initiative was our aqusition of RocketChips 
Inc. two years ago. Today, a dedicated 
R&D team in the Communications 
Technology Division (CTD) is wholly 
focused on improving, developing, and 
delivering the capabilities that make up the 
Serial Tsunami vision. Commenting on 
this strategy, Wim Roelandts, president 
and CEO of Xilinx, said: “The underlying 
RocketlO technology that’s making it pos- 


“The underlying HocketIU technology that's making it possible tor Xilinx 


to support multi-gigabit systems is truly rocket science — not something 


you d want to simply license trom an external IP provider” 


speed systems. This double benefit has 
propagated waves of new serial interface 
standards development. 

The inevitable result is the current wide- 
spread migration toward serial I/O across 
many segments of the industry, including 
PC and consumer, storage and servers, com- 
munications networking, industrial com- 


puting and control, and test equipment. 
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tiative is a vision for delivering a complete 
suite of serial connectivity solutions — 
including Platform FPGAs, IP cores, 
design software and methodologies, refer- 
ence designs, solution boards, extensive 
characterization data, and training classes 
— to enable you to design your next- 
generation products. 


Unlike point solutions, the Xilinx 


sible for Xilinx to support multi-gigabit 
systems is truly ‘rocket science’ — not some- 
thing you'd want to simply license from an 
external IP provider. Having the internal 
expertise is a critical element of our strate- 
gy to make serial a mainstream technology 
by providing designers with a comprehen- 
sive, scalable, and cost-effective solution.” 


“As the only FPGA vendor shipping plat- 
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Figure 2 - RocketIO serial transceiver in Virtex-II Pro FPGAs 


form devices with programmable 3.125 
Gbps transceivers and IP cores for key serial 
interface standards, it’s evident that our 
approach is paying off,” he added. “Watch 
for many more exciting developments in the 
coming months as we continue to extend 
our market lead in serial connectivity.” 
Steve Berry, principal analyst for 
Electronic Trend Publications, evaluated the 
benefits of the serial I/O trend and the role 
Xilinx plays in the movement: “Through its 
leading technology and well-established 
strategic partnerships, Xilinx is poised to 
lead the industry in the transition to serial 
interfaces. System architects will experience 
dramatic improvements in bandwidth, pin 


count, power, and signal integrity.” 


Virtex-Il Pro Platform FPGAs 

Although it is widely accepted that serial 
I/O delivers significant advantages over 
parallel I/O methods, until now there was 
no flexible, cost-effective, general-purpose 
silicon support. The standards wars do not 
have clear winners, and the transition path 
is not obvious. 

Virtex-II Pro Platform FPGAs (Figure 1) 
deliver state-of-the-art serial I/O with as 
many as 24 RocketIO transceivers embed- 
ded in the highest performance FPGA fab- 
RocketIO 
transceivers (Figure 2) operate at speeds 
from 622 Mbps to 3.125 Gbps. The prod- 


uct is available in a wide range of program- 


production. The 


fic ih 


mable logic densities in 10 devices and 
several packages. Virtex-I] Pro FPGAs sup- 
port all major emerging serial interfaces 


such as PCI Express, 1 Gb Ethernet PHY, 
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10 Gb Ethernet XAUI, Fibre Channel, 
OC-48, OC-192, and OC-768 SONET 
for backplanes, and Serial RapidIO. 

Virtex-II Pro FPGAs also support parallel 
interface standards such as SPI-3 (POS PHY 
Level 3), SPI-4.1 (Flexbus 4), SPI-4.2 (POS 
PHY™ Level 4), 10 Gb Ethernet Media 
Independent Interface (XGMII), RapidIO, 
PCI, PCI-X, CSIX, HyperTransport™, 
XSBI, and SFI-4. 

The Virtex-II Pro FPGA is the only 
device available today that enables bridg- 
ing across all these interface classes and 
generations, making it the ultimate con- 


nectivity platform. 


Xilinx Tools and Solutions for 

Serial Connectivity 

Using the 3.125 Gbps RocketIO integrated 
transceivers in Virtex-II Pro FPGAs, Xilinx 
and its partners are delivering pre-engineered 
IP cores for serial connectivity protocols. 
Cores for 1 Gb Ethernet MAC with PHY, 10 
Gb Ethernet MAC with XAUI, PCI Express, 
Fibre Channel, and reference designs for 
SONET OC-48 backplanes are available 
now, and more are being added. You can also 
build higher level protocols using the embed- 
ded IBM PowerPC™ processors. Reference 
designs and evaluation/prototype boards 
help you verify the performance of trans- 
ceivers in real hardware. 

Designing with parallel I/O meant you 
were limited to speeds of 33 MHz to 133 
MHz. In the serial world, these speeds leap 
to 622 MHz, 3.125 GHz, 10 GHz, and 
beyond. This raises challenges with PCB 
design and signal integrity. Xilinx and lead- 


ing EDA partners such as Cadence are solv- 
ing this dilemma with tools such as 
SPECCTRAQuest™ transmission media 
and HSPICE™ (highly accurate simula- 
tion program with integrated circuit 
emphasis) models. 

Xilinx provides in-depth characteriza- 
tion data (Figure 3) for the RocketlO 
transceivers for flawless system design using 
our Virtex-II Pro FPGAs. You will know 
exactly how your design is expected to 
operate, and you will be supported every 
step of the way with best design practices 
and PCB layout guidelines. 


Aurora Reference Design 

Aurora is a new, open, lightweight, scala- 
ble serial interface provided by Xilinx to 
help you transition from parallel to serial 
interfaces. It supports any transport pro- 
tocol, has a compact architecture, and 
delivers low latency. A single-lane refer- 
ence design is available for free download 
at: www.xilinx.com/aurora. A quad link 
reference design will be available during 
the first half of 2003. 

Xilinx takes an active role in industry 
standards organizations — including 
PICMG, RapidIO Trade Association, 
NPE, OIF, PCI-SIG, XFPR SMPTE, and 





Figure 3 - RocketIO characterization 
using ML320 hardware platform: 
eye diagram at receiver shows 3.125 Gbps 
on 44-inch FR4, two connectors with 
33% pre-emphasis. 
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PCl 32/33, 5-client system (250 pins, 1 Gbps) 





et? 


PCl Express 5-client system (80 pins, 80 Gbps) 





Figure 4 - The PCI Express serial standard delivers 80X bandwidth with less than 
one-third of the pins required by the PCI parallel protocol. 
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Figure 5 - Serial XAUT allows 10 times longer PCB trace, lower pin count, and lower jitter/skew than XGMIL. 


others — with the goal of providing com- 
plete solutions synchronized with the 
availability of new standards. An example 
is the industry’s first PCI Express core, 
which Xilinx released on the same date 
that the PCI-SIG ratified the specifica- 
tion. [Ed. note: See “Xilinx Technology 
Enabled Instant Deployment of Real-PCI 
Express” in this issue.] 

The Serial Tsunami Initiative also offers 
several levels of training about working 
with serial technologies. These range from 
an online introduction to in-depth face-to- 
face classes. In addition, design support 
and services are available from experienced 


designers within Xilinx Design Services. 


Manage Interoperability 
The multiplicity of available and emerging 


serial standards is reflected in the range of 
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interfaces embraced by available ASSPs. 
Your optimum design could easily incorpo- 
rate multiple ASSPs equipped with a variety 
of interfaces. Xilinx and ASSP vendors work 
closely together to ensure interoperability 
with Virtex-II Pro features and electrical I/O 
standards as well as interface IP cores jointly 
verified in hardware. [Ed. note: For a discus- 
sion of ASSP and Virtex-II Pro FPGA inter- 
operability, see “Virtex-II Pro Platform FPGAs 


Deliver Proven Interoperability” in this issue.] 


Bridge Across Any Standard 

The leading parallel standards are not going 
to vanish overnight, and you might find it 
necessary to accommodate an older parallel 
standard in a design that is intended to pro- 
mote a newer serial standard. So how do 
you manage the transition? Virtex-II Pro 


FPGAs have solved this for you by support- 


ing both serial and parallel interfaces within 
the same FPGA. Xilinx delivers IP cores 
and reference designs to interface with par- 
allel standards such as 10 Gb Ethernet 
XGMII, RapidIO, SPI-3, SPI-4.1, SPI-4.2, 
HyperTransport, PCI, PCI-X, CSIX, XSBI, 
SFI4, and many others. 

With Virtex-II Pro FPGAs you can 
bridge across parallel and serial interfaces to 
make a seamless transition. Plus, you can 
continue to interface with the ASSPs that 
best suit your needs based on their func- 


tion, not on their interface support. 


Case Studies — Serial vs. Parallel 

Serial helps you break the bandwidth bot- 

tleneck with higher data rates using fewer 

pins. This lowered pin count delivers 

many advantages over traditional parallel 

implementations. 

¢ Low device pin counts: With fewer 

pins per connection, you save costs 
from small board real estate, smaller 
packages, fewer PCB traces and layers, 


and even smaller connectors. 


Expansion and scalability: High-speed 
serial pipes scale to support higher data 


rates as needs change. 


Improved EMI and noise immunity: 
With embedded clock-data mechanism 
that is serial vs. wide parallel data and 
clock, you get higher clock rates at 


lower EMI, noise, cross talk, and skew. 


Physical interfaces: Serial I/O physical 
interfaces can also drive long PCB 
traces on backplanes or even external 


cable (copper or optical). 


One obvious way to analyze the costs 
and benefits of I/O interfaces is to compare 
performance per pin. Let us take a look at 
two examples — PCI Express vs. PCI, and 
10 Gb Ethernet XAUI vs. XGMII. 


Case 1 — PCI and PCI-Express 
PCI is a higher bandwidth, parallel, shared- 
bus standard, while the PCI Express proto- 
col is a new serial version. A 5-client 
communication example (Figure 4) con- 
trasts the two standards: 

¢ A 32 bit/33 MHz PCI interface requires 

50 pins, delivering about 1 Gbps 
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(32 bits x 33 MHz = ~1 Gbps) total 
bandwidth that would be shared among 
all 5 clients (5 x 50 pins = 250 pins). 


e The PCI-Express interface operates at 
2 Gbps data rate in each direction, 
delivering 4 Gbps full-duplex band- 
width over just four pins (that is, two 
differential pairs) for each point-to- 
point connection. The total aggregate 
bandwidth provided in the same 
5-client configuration is 80 Gbps 
(4 Gbps/connection x 4 connections/- 
client x 5 clients = 80 Gbps). 


Case 2 — 10 Gigabit Ethernet XGMII 
and XAUI 

XGMII is a full-duplex interface between 
the MAC and PHY layers. XAUI is a seri- 


alized version of this interface (Figure 5). 


XGMII is a full-duplex, parallel inter- 
face operating at 312.5 Mbps per wire 
using 74 pins for data, clock, and con- 
trol. Its data rate is 10 Gbps each way. 
Due to signal count, skew, and other 
problems, XGMII fails to support 
multiple interfaces in a single chip or 
routes longer than a few centimeters. 
Hence, it is restricted to be only a chip- 
to-chip interface and has a maximum 


FR4 trace limitation of 2 inches. 


XAUI is a 4-lane, full-duplex, serial 
interface, with each lane running at 
2.5 Gbps data rate (3.125 Gbps baud 
rate). It requires 4 differential signal 
pairs in each direction and hence, 16 
pins in total, delivering 10 Gbps agegre- 
gate data bandwidth. Automatic de- 
skew and pre-emphasis allows XAUI to 


route as much as 20 inches FR4 on 
PCBs, backplanes, and even cable. Its 
low pin count makes it highly scalable. 


Fewer pins, higher bandwidth, scalability, 
and significant cost savings are all driving the 


move to serial. 


Conclusion 

The Xilinx Serial Tsunami Initiative is revo- 
lutionizing system architectures. State-of- 
the-art I/O delivers 
high-bandwidth at low cost. As the industry 


serial scalable, 
rapidly moves to serial connectivity, and 
away from current parallel interface schemes, 
you must either sink or swim. Xilinx and 
Virtex-II Pro Platform FPGAs offer a whole 
boatload of serial solutions from the initial 
concept to the finished product. Check it out 


at: www.xilinx.com/sertalsolution/. ¥& 
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DSPs into FPGAs 


DSP Algorithms into FPGA Solutions 
Don't waste your valuable time and resources. 


Let Dillon Engineering pave the way to realize 
your complex algorithms in FPGAs. 


We specialize in custom design services, with 


particular emphasis on FPGA-based DSP 
algorithms and high-bandwidth, real-time digital 
signal and image processing applications. 


Consider us to be a cost-effective extension to 
your team. Use your expertise to design new 
algorithms and our expertise to implement them 
in FPGAs. Speed your designs’ time-to-market by 
leveraging our proven proficiency and capability. 


www.dilloneng.com 
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Edina, iN $5424 
52-636 2413 





Spring 2003 Xcell Journal 13 


i fe Z 


learn how to accelerate your . 


-multi-gigabit serial link design process. 


by Donald Telian 
Technologist 

Cadence Design Systems, Inc 
donaldt@cadence.com 


The move to multi-gigahertz (MGHz7) seri- 
al technology represents a sea change of 


tsunami proportions. A variety of industry 


forces are revolutionizing product design to 


accommodate the speed and throughput of 
serial data transfer. 
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This article will show you how to har- 
ness the power of the Xilinx Serial 
Tsunami Initiative. We'll look at helpful 
tools and effective techniques already 


being used by engineers today. 


Effective Data Transfers 

Whether youre moving bits down an 
MGHz serial link or moving money into 
your bank account, it’s important to make 


sure all the data is transferred correctly. 








, 





You simply cannot afford to lose data. 
Figure 1 illustrates two types of data 
transfers relevant in MGHz design. The 
first row shows the serial link itself. Data 
is sourced by the transmitter (Tx), trans- 
ferred through the differential intercon- 
nect, and latched onto by the receiver 
(Rx). If all elements are not tuned to each 
other, the data is not transferred effective- 
ly. The transmission medium must be 


designed carefully — and all three ele- 
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ments (Tx, transmission medium, Rx) 
must be well-matched. 

Similarly, the second row of Figure 1 
shows the design chain between Xilinx 
serial technology and your design process. 
Just as in the case of the serial link, the 
Xilinx technology must be delivered to you 
in a medium that matches your design 
process to ensure a clean data transfer. 

In cooperation with Cadence, Xilinx has 
developed the SPECCTRAQuest Design 
Kit as a way to effectively communicate the 
operation of the RocketIO™ MGHz trans- 
ceivers found in the Xilinx Virtex-II Pro™ 
FPGAs. Later, well examine the Xilinx- 
Cadence partnership and how you can use 
it to accelerate your design process and 
improve your products. But first, let’s take a 
closer look at the MGHz 


serial link itself. 


How Serial Links Work 
Measuring the “opening” 


on an eye diagram is a 


common way to judge the Serial 

effectiveness of serial trans- Link 

mission. Figure 2 superim- 

poses eye diagrams of 

received signals in three 

lich ; Design 

slightly different test cases. 
Chain 


In the green signal’s circuit, 
the transmitter, intercon- 
nect, and receiver imped- 
ances are well-matched. 
Here, all three elements are 
working together to pro- 
duce an acceptably wide 
eye opening. 

The other two wave- 
forms in Figure 2 show 
what happens when only 


one of the three elements 


boege [I 


becomes imbalanced. In 
the blue signal’s circuit, a 
mismatch in the imped- 
ance of the transmission 
line causes erratic signal 
behavior and a collapse of 
the eye opening. Changing 
the transmitter’s imped- 
ance, however, causes an 
even further collapse in the 


red signal's circuit behavior. 
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Although the red signal appears more deter- 
ministic than the blue case, the transmitter in 
this case is not delivering enough voltage 
swing to the circuit to meet the thresholds in 
the receiver to extract the serial data. 

Items that make an MGHz serial link 
work right include: 

¢ Proper sizing of the transmitter for the 


required voltage swing 


¢ An understanding of the differential 
impedance of the transmission medi- 
um (Z_differential is typically 
2*(Z_ uncoupled — Z_coupled]) 


¢ Matching that impedance with a ter- 
mination resistor between the two nets 


at the receiver's inputs 


Data Transmission 
Source 


Medium 







‘ Differential 
Interconnect 


SPECCTRAQuest 
Design Kit 


Xilinx 





Figure 1 - How data is effectively transferred in a serial link, and its design chain. 
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Figure 2 - Mismatch in any one of the serial link components can 


adversely affect performance. 





¢ Thorough characterization and 
accounting for the interconnect’s dis- 
continuities and behaviors (such as 


vias, connectors, dielectric loss). 


The SPECCTRAQuest RocketlO Design Kit 
Recognizing that the MGHz design process 
has discontinuities too, Xilinx proactively 
developed the RocketIO Design Kit for 
Cadence’s SPECCTRAQuest high-speed 
PCB design tool. This kit was first intro- 
duced with the Virtex-IJ Pro FPGA 
in March 2002, and was described in an 
Xcell Journal article at that time 
(see support.xilinx.com/publications/xcellon- 
line/partners/xc_speckit42.htm). The kit 
helps you implement the RocketIO tech- 
nology by providing the electronic files and 
models that match and can 


be inserted directly into your 


User design process. Multimedia 
of Data 


tutorials within the kit help 
you quickly understand the 
steps involved. 

Mohammad W._ Ali, 
Ph.D., a technologist at 
Tellabs, found the kit to 
offer significant improve- 
ments in both the through- 
put and quality of his design 
process. He states, “The 
new silicon package board 
solutions in the design kit 
save me a lot of time, par- 
ticularly for my multiboard 
simulations that involve dif- 
ferent styles of routed 2.5 
GHz __ differential 
With the new interfaces in 
this SPECCTRAQuest Kit, 


I can accomplish my simu- 


pairs. 


lation task 10 to 20 times 
faster.” 

With RocketIO trans- 
ceivers, signaling through- 
put has increased an order 
of magnitude. And with the 
accompanying design kit, 

the 


design process has increased 


throughput of the 


similarly as well — even with 
the challenges of MGHz 
design. 
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Design Chain Optimization 
Great technology that is hard to 
use isnt really all that great. 
New technologies have failed 
because they were just too hard 
to access or too complex to 
work with. That's why Figure 1 
shows the two parallel chal- 
lenges that must be solved for 
high-speed serial communica- 
tion to succeed: 
1. Proper transmission of 
serial data from transmitter 


to receiver, and 


2. Proper transfer of serial 
technology from Xilinx 
to you. 


Focusing on the second chal- 
lenge is what “design chain 
optimization” is all about. 

Design chain optimization is 

the only way to achieve the 10X to 20X 
design task improvement that the 
RocketIO kit has to offer. 

Figure 3 illustrates the design chain. 
Because the term “design chain” is not as 
common as “supply chain,” both are shown 
to help you understand their function and 
relationship to each other. Within the 
design chain, design kits of “virtual compo- 
nents” (in the form of models, EDA files, 
and databases) are transferred from the 
technology deployment group at one com- 
pany to the engineering group of another. 

In our example, the RocketIO kit effec- 
tively communicates the nuances of 
MGHz technology to Xilinx customers. 
This is done by avoiding the vagaries of 
textual datasheets, instead providing elec- 
tronic files that can be easily inserted into 
your design process. These files are “exe- 
cutable specifications” that can quickly be 
understood by engineers all over the 
world, because the tool shows the 
RocketIO serial transceiver in a context 


with which they are familiar. 


Bridging IC to PCB 

Just as all elements in a serial link must be 
matched, so must the elements in the seri- 
al design chain. But here Xilinx had a chal- 


lenge: the model formats commonly used 
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Figure 3 - Design and supply chains and the role of design kits. 





by PCB designers would not work with 
this new technology. In fact, the only accu- 
rate representation of the RocketIO trans- 
ceiver was the model used to design the 
silicon — an [C-level model that only 
worked in IC design tools. 

As Figure 1 shows, the SPECCTRA- 
Quest Design Kit answered this challenge 
and became an efficient “transmission 
medium” to bridge the worlds of IC and 
PCB modeling. New technology in the 
SPECCTRAQuest kit allows you to simu- 
late arbitrary PCB layouts with complex 
IC models — all from the SPECCTRA- 
Quest user interfaces commonly found in 
the high-speed PCB design process. If 
Xilinx had required PCB engineers to 
learn new IC simulation tools, it would 
have caused a mismatch in the design 
chain and hindered the adoption of 
RocketIO transceivers. 

Wenwei Qiao, an engineer at Applied 
Materials, prefers using the Xilinx and 
Cadence kit’s pre-packaged complex sili- 
con models within the SPECCTRAQuest 
environment because they can be manip- 
ulated much like simpler IBIS-style mod- 
els. “In only 10 minutes after installation, 
I was able to begin simulating my multi- 
gigabit solution,” he reports. The user 


interface helps him focus on the design 


: Supply chain 


Distribution 


Manufacturing 


a eS 
Distribution 
Component 


.__ Kits 
Manufacturing « e 
A must-have” for developing 


task and improve his prod- 
ucts quality instead of 
wading through thousands 
of lines of text-based mod- 
els and netlists. 

Stéphane Tessier, a hard- 
ware engineer at Radical 
Horizon, a Montreal-based 
radio 


(SDR) solution provider, 


software-defined 
agrees that the kits are a 


multi-gigabit links. He 
found that the tutorial infor- 
mation in the kits shortened 
his learning curve, and he 
believes use of the kits will 
“reduce the number of board 


iterations.” 


Conclusion 

A survey of engineers cur- 
rently using the kits revealed 
that they unanimously find them valuable 
for serial MGHz design. Already, 75% of the 
engineers believe that using the combined 
Xilinx/Cadence kit has helped them improve 
their product's quality. 

During 2002, the integration of the 
SPECCTRAQuest and RocketIO design 
kits have become an integral part of the 
Xilinx Serial Tsunami Initiative — listed 
among EDN magazine’s top 100 products 
for 2002. 

The serial tsunami is here and growing. 
As you join fellow engineers in riding the 
serial wave, be sure to download your free 
copy of the SPECCTRAQuest Design 
Kit. It will help you put the power of 
MGH7 signaling into your next design. 


For More Information 

The SPECCTRAQuest RocketIO Design 
Kit can be downloaded free of charge at: 
support.xilinx.com/support/software/spice/spice 
-request.htm. Registration and click-license 
NDA are required. 

Information about Cadence SPECCTRA- 
Quest (SQ) and other free SQ design kits is 
available at www. specctraquest.com. 

An executive white paper on design chain 
optimization is available at /ttp://register. 


cadence.com/register.nsfidesignChain/. ¥: 


Spring 2003 






Serial Tsunami Requires 
Changes fo SI Verification 


° Mentor Graphics’ Hyperlynx products 
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by Dave Kohlmeier 

Director of Engineering, Simulation and 
Analysis Products, System Design Division 
Mentor Graphics, Inc. 
dave_kohlmeier@mentor.com 


We see third-generation I/O (3GIO) 
serial connectivity standards showing 
up everywhere: HyperIransport™, 
InfiniBand™, and PCI Express™. 
Why are all the major platform ven- 
dors moving to these new interconnect 
schemes? Basically, the speed of this 
“Serial Tsunami’ is being driven by the 
inability to cost-effectively resolve sig- 
nal and power integrity issues prevalent 
in standard synchronous, parallel, mul- 
tidrop bus designs. Power and ground 
noise due to large high-frequency cur- 
rent demands in large voltage, swing- 
single-ended designs; reflections from 
stubs due to multidrop connections; 
impossible delay/skew constraints for 
data versus clock nets; and other limi- 
tations have made cost-effective high- 
speed parallel design unattainable. 

At the crest of the serial tsunami 
are differential signaling, embedded 
clocks, pre-emphasis, and point-to- 
point interconnect; all these elements 
give us a way around the electromag- 
netic barriers of parallel multidrop 
design. These changes require us to 
change our verification methodologies to 
assure successful designs. Let’s take a look at 
some of the changes in detail, focusing on 
how they affect the need for up-front plan- 


ning and verification. 


Differential Signaling 

Low voltage differential signaling (LVDS) is 
the real basis for the transition to 3GIO. 
LVDS, a two-wire system where return cur- 
rents are expressly dealt with by the second 
closely coupled wire, is used to minimize 
ground-return loops and the cumulative 
effect produced by wide parallel buses on that 
return loop (otherwise known as simultane- 
ous switching noise, or SSN). At these high 
frequencies, though, it is still very important 
to maintain a complete ground return path 
(no cuts or voids in the ground plane), espe- 


cially for common mode currents. 
y 


18 Xcell Journal 








Figure 2 - Differential impedance stackup planning 
using the stackup editor 


New 3GIO constraints for differential 
(diff) pair topologies include: 


¢ Minimize coupling from adjacent pairs 
¢ Match length of traces in a pair 


e Minimize the number of vias used, but 
match use and location within pairs 


when they are used. 


The idea here is that the more the diff 
pair is coupled, the more external noise will 
be rejected. An aggressor pair will induce a 
signal in the victim pair, but as such, it 
would induce the same signal in both wires 
(if they are closely coupled). Any noise 
agent that affects both wires in a diff pair 
will have no effect on the resulting differ- 
ential waveform. 

So, it is important that a verification 
tool understands the self-coupling of each 
diff pair and the coupling between pairs, 





. = “2i|_ including broadside (up and down a 
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layer) and edge-coupled (side to side 
on same layer) pairs. Figure 1 shows 
the EM field lines calculated by the 
Mentor Graphics HyperLynx™ 2D 
field solver for two diff pairs. Notice 
the difference in the number of lines 


(which field 


between members of a pair versus 


indicate strength) 
those between the pairs themselves. 
Impedance planning is an impor- 
tant step with diff pairs where inter- 
nal terminations might require you 
to maintain specific differential 
impedance. For this reason, we have 
included an impedance-planning 
dialog in the HyperLynx stackup edi- 
tor, as shown in Figure 2. In the edi- 
tor, you can simply set a priority 
parameter, such as trace width, and 
request a spacing value that allows 
you to reach your goal (say, a differ- 
ential impedance of 100 ohms). 


Vias 

At extremely high frequencies, using 
vias can increasingly introduce sig- 
nal integrity (SI) problems. Why? 
Vias are a discontinuity in the trans- 
mission line. 

Just visualize the nicely controlled 
impedance of a trace over a ground 
plane and then compare that to a ver- 
tical tube of copper with no corresponding 
ground return path — the impedance is 
clearly different. In these systems, it’s 
important to keep diff pair vias close 
together and to add ground stitching vias in 
the vicinity for shielding and common 
mode return. 

However, it’s also important to simulate 
the impedance discontinuity. HyperLynx 
Version 7.0 has given you the ability to have 
via L and C calculated automatically, to 
specify L and C values for each via type, or 


to set a default via value for the entire board. 


Transmission Line Losses 

Without going into a lengthy discussion on 
losses in transmission lines, let’s just stress 
the importance of your verification and 
planning tools in recognizing and support- 


ing loss. As frequencies increase, AC losses 
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are significant. Skin effect (where more 
current is forced to the surface of a con- 
ductor) and dielectric loss (thermal heating 
of the dielectric as the EM waves travel 
through the dielectric) are the culprits. 
Especially in backplanes where the trace 
length on FR4 can be substantial, loss- 
es will reduce and “smear” voltages at 
the receiver. This reduces your ability 
to have a clean “eye” for the receiving 
IC to sync to, and extract, the correct 
data. HyperLynx Version 7.0 includes 
the trusted “W” element in its simula- 


tor for robust loss support. 


High-Speed Models 

Modeling and simulation go hand- 
in-hand. Whether you are doing ana- 
log or digital simulation, you can’t get 
very far without models. In the SI 
business, the I/O Buffer Information 
Specification (IBIS) standard has 
been a huge benefit to systems 
designers. Virtually all IC vendors are 
now making I/O models available for 
their devices. 

As we enter this next decade with 
3GIO, IBIS is moving to support sub- 
circuit models in SPICE (Simulation 
Program with Integrated Circuit 
Emphasis) or VHDL_AMS in the 
proposed IBIS 4.1 (now Bird 75) spec- 





ceiver implementation, simply assign the 
RocketlO model just as you would an IBIS 
model and use either your true topology 
extracted from your layout system 
(HyperLynx supports trace model extrac- 


tion from all major PCB vendors) or a 


is a large enough window of time where the 
signals are reliably in one state or the other 
(necessary for the receiver clock recovery 
circuit to dependably extract the data). 

As in digital simulation, we are required 
to create stimuli that can affect the signals 


in a realistic way, which in turn we 


imifi, see as a changing shape in the “eye.” 
——— _ Because the bit history will affect the 
= analog result, an-easy-to-use multibit 

a stimulus editor in HyperLynx 

5 — Version 7.0 has been added that 
oe allows you to create bit streams of 

"i ll ones and zeros for the simulator to 
a ee drive the diff pair (Figure 4). Pseudo- 
random, 8-bit/10-bit encoded and 


Figure 3 - Oscilloscope view of an eye diagram including eye mask 





Lise FL P= Lae i 


oma | Eve ww | 
‘Smal ad a! 
Laity conc | = igh Mieco alter |! or | 
ee eee =| 
a 3 ee 





=z 


ification. In the meantime, simulation 


environments must support the mod- 


TOOPPHOOOSOHTeLIOotteinotsoioved 
Gop keke IDE 


els that are currently available, and 


those are predominately HSPICE 








(from MetaSoft). 

HyperLynx Version 7.0 offers sup- 
port for HSPICE models. These mod- 
els are generally encrypted and require a 
license for the HSPICE simulator; Version 
7.0 calls the HSPICE simulator from the 
HyperLynx environment. Model assignment 
and waveform analysis are the same as 
always, but behind the scene HyperLynx pre- 
pares a SPICE netlist of the entire electrical 
topology (including all coupling and loss ele- 
ments) and then invokes HSPICE, extracts 
the results, and presents them in the native 
HyperLynx environment. This includes any 
multibit stimulus patterns you have defined. 

For example, if you want to simulate a 
Xilinx RocketIO™ 3 multi-gigabit trans- 
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Figure 4 - Multibit stimulus generation dialog 


“what-if” topology defined in HyperLynx. 
Then hit the simulate button and view the 
results that are displayed in an eye diagram 


based on HSPICE results. 


Eye Diagrams 

For us digital designers, an eye diagram — as 
shown in Figure 3 — is nothing more than 
what we have been used to looking at in a 
logic analyzer for years — data bits and high 
and low voltage levels. The difference 
between a logic analyzer and an eye diagram 
is that in the eye, hundreds of bits are over- 


laid on each other so that we can see if there 


customer defined patterns are sup- 


| ported, as well as random (uniform 


or Gaussian) jitter. This functionality, 
combined with easy-to-use differen- 
tial probes, provides you with robust 
eye diagram suppott. 

3GIO systems typically specify 
what a sufficient eye should look like 
for a receiver to extract the clock and 
data. This capability is referred to as 
an “eye mask.” The ability to specify 
an eye mask is included in HyperLynx 
Version 7.0, giving you an easy way to 
“see” in the oscilloscope view if your 
resulting eye pattern meets the manu- 


facturer’s specification. 


Conclusion 
High-speed, low-voltage differential 
serial interconnect is the wave of the 
future for both interboard and intra- 
board interconnects. The features 
you need in a verification and analy- 
sis tool are changing to support this transi- 
tion. Because tolerances are extremely tight 
in all aspects of these interconnect systems, 
the entire system must be simulated to 
assure first-pass success of the PCB design. 
In summary, consider a quote from an 
Intel® white paper on PCI Express, 
“,..detailed simulation and validation are 
necessary to guarantee a successful design.” 
We at Mentor Graphics believe our latest 
release of the HyperLynx products will 
make this task easier and more productive. 
We wish you luck as you move into the 


gigahertz world of serial connectivity. & 
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FPGA Design 








New Thresholds. 


FPGA complexity is here. 
So are the tools to handle it. 





FPGA | In the world of FPGA, the only constant is change. As gate counts soar into the millions, designers require 
more robust tools and technologies to maximize speed and performance. Only Mentor Graphics has a fully inte- 
grated FPGA solution that gives you the power to handle today’s cutting edge designs. Our full suite handles your 
challenges in design creation, verification, synthesis, embedded systems, intellectual property and FPGA-on- 
board. Add in Mentor’s award-winning customer support and you'll be able to open any doors you want. View our 
Concept to Silicon video seminar at www.mentor.com/fpga 


©2002 Mentor Graphics Corporation. All Rights Reserved. Mentor Graphics and FPGA Advantage are registered trademarks of Mentor Graphics Corporation. 





Could Microprocessor 
Obsolescence Be History? 


Embedding sott processors in FPGA fabric otters a radical but robust new 
solution that effectively eradicates the problem of processor obsolescence. 





by Karen Parnell 

Product Marketing Manager, Automotive 
Xilinx, Inc. 

karen.parnell@xilinx.com 


Your biggest obsolescence problem is out-of- 
date microprocessors and microcontrollers. 
Processors have shorter life spans than ever 
and are often discontinued at short notice, 
victims of fluctuating consumer market 
trends and the demand for ever-greater 
speed. In addition, obsolescence is deliber- 
ately built into such items as consumer 
products, encouraging microprocessor man- 
ufacturers to abandon existing processors in 
pursuit of planned platform introductions, 
thus propagating a ripple effect of obsoles- 
cence. Even for designs coded in “C” (tout- 
ed as being “portable code”), there are always 
architecture-specific instructions and fea- 
tures that hamper the changeover from the 
obsolete processor to the next-generation 


device. This changeover problem is exacer- 


© bated by different package options and 


I/O configurations, which can necessitate a 
complete board re-spin. 

Soft processor cores offer an 
extremely attractive alternative to 
traditional approaches toward the 

problem of obsolescence, all of 

which are time-, labor-, and 
cost-intensive, and waste 


legacy development efforts. 
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A Look at Traditional Approaches 

The predicament of automotive telematics 
equipment designers, for example, is rep- 
resentative of the problem. Although 
design and development time scales have 
shrunk from five years down to two, pro- 
duction is measured in years and the prod- 
ucts remain active in the field for another 
10 years plus. If we imagine a scenario in 
which every Electronic Control Unit 
(ECU) in a car contains at least” one 
processor, and _ that 

every car contains up to 

60 ECUs, this translates 

to a major problem 


every time a processor Is 


Inserting a new processor along with 
software written to emulate the old one is, 
at present, better in theory than in reality — 
although it’s an appealing concept that 
does have some operational history. This 
option preserves the legacy software, so the 
process is relatively cheap and fast. But 
once again, the solution is not permanent. 
If the system has a long projected life, a 
new solution might have to be repeated 


every few years. 


Off-the-shelf solutions often merely sub- 
stitute new headaches for old ones. Let’s say 
you need a processor with 10 UARTs, an 
interrupt controller, and access to a block of 
external flash memory. Although there are 
many off-the-shelf processors offering mul- 
tiple UARTs and other desired peripherals, 
they typically have numerous other periph- 
erals that would go unused in your system. 
So, not only are you paying for the addi- 
tional peripherals, but you often have to 

place unused peripher- 
als into a safe mode or 
otherwise disable them 
via software. 


Decommissioning 


obsoleted at relatively unused __ peripherals 


PURGPEERES Eo PORRCPERORE REE 


ay 


short notice. places an additional 
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with processor obsoles- > not only have to make 
cence. The applicability G a the processor peripher- 
of any given solution 140 Cate = als operate correctly, 


depends upon a num- 
ber of variables: the 
value of the application 
software, the projected 
life of the system, and 
the amount of time and 
money available to solve 
the problem. 

The most expensive 
solution is to redesign 
the system around a new processor. 
Depending on the volume of the code, a 
redesign can cost hundreds of worker- 
years, much of it devoted to validation and 
testing. Not only is the huge investment in 
debugging and refining the existing soft- 
ware lost — a clear case of throwing the 
baby out with the bath water — but the 
solution is temporary at best. If the system 
has a long projected life, the same problem 
will recur every few years. 

Another solution is the Last Time Buy 
(LIB), which, on the surface, appears to be 
the most cost-effective option. The prob- 
lem with this approach is that the designer 
must guess how much product to buy for 
the life of his program. If he guesses wrong, 
he is faced with an even more difficult 
problem: a larger legacy investment that 


must somehow be upgraded. 
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Figure 1 - Xilinx PicoBlaze and MicroBlaze soft processors 


More important, software emulation is 
inherently a serial process. Because so 
much performance is consumed running 
the emulation rather than the applica- 
tion, it is slow. It has been shown empiri- 
cally that emulation requires, on 
average, about 20 clock 


cycles of the new 


o 
oe 


emulation engine will 


processor for every 
legacy instruction 
it executes. 

In addition, 
emulation breeds 
further obsoles- 
cence, because the 


processor used as the 


itself become obsolete and 
may force an entire rewrite of 


the emulator. 


150 MHz 


but now have to write 
code for the parts of 
the processor that are 
not used. Clearly, 
purchasing an off-the- 
shelf solution in this 


would be 
highly wasteful, not 


scenario 


only in terms of initial 
cost, but also in wasted 


engineering time during the design process. 


The Soft Processor Solution 


The soft processor solution eradicates 


processor obsolescence and preserves many 
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MicroBlaze Development Board Part Numbers 


Avnet (Silica) 
ADS-SP2-MB-EVL 
ADS-VE-MB-EVL 
ADS-VE-MB-DEV 


DS-KIT-MBLAZE-V2 


Table 1 - EDK Development boards 


years of legacy code and development. The 
new approach is to own the soft processor 
core and embed it in an FPGA host. Not 
only can you port the core to multiple 
FPGA platforms, but you can also design 
the peripheral set to meet your exact design 
requirements, thus eradicating architecture 
compromises and wasted peripherals. 

The Xilinx MicroBlaze™ soft processor 
gives you the luxury of a different approach. 
Now, you can start with a processor core 
and build the peripheral set to meet your 
exact requirements. You've eliminated sili- 
con wastage because you implement only 
what you need. Youve reduced software 
design complexity because no code need ever 
be written to disable unwanted processor 
functionality. Creating unusual processor 
configurations — which can be changed at 
any time to suit changes in the specification 
— is now a simple task. 

Even if after five or six years of field use, 
when the FPGA hardware may itself be 
nearing the end of its life, the soft proces- 
sor core can simply be dropped into its new 
FPGA host, using the same “C” code. The 
hardware platform may need some PCB 
modifications but the legacy code remains 


usable and intact. 


MicroBlaze and PicoBlaze Soft Processors 
Xilinx offers both a 32-bit MicroBlaze soft 
processor core and an 8-bit PicoBlaze™ 
soft core. The PicoBlaze processor runs at 
speeds of 116 MHz, yet occupies a tiny 
footprint of just 35 configurable logic 
blocks (CLBs). (See Figure 1.) 
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MicroBlaze/Spartan-Il Evaluation Kit 
MicroBlaze/Virtex-E Evaluation Kit 
MicroBlaze/Virtex-E Development Kit 


Virtex-Il Embedded Development Kit 


EMUL-MicroBlaze-PC | High-Speed Debugger for MicroBlaze Soft Processor 





The MicroBlaze 
32-bit soft proces- 
sor core is the 
industrys fastest 
soft processing solu- 
tion. lt runs at 150 
MHz and delivers 
100 D-MIPS. It 
RISC 


architecture with 


features a 


Harvard-style, sepa- 
rate 32-bit instruc- 
tion and data buses 
running at full 
speedto | execute 
programs and access 
data from both on-chip and external mem- 
ory. A standard set of peripherals is IBM 
CoreConnect™ bus- enabled to 
offer MicroBlaze designers compatibility 


and reuse. 


Xilinx EDK Solutions 
Xilinx Embedded Development Kits 
(EDKs), including the soft processor core 
and a standard set of peripherals, are available 
from Xilinx and its distribution partners 
(see Table 1). The kits include a complete set 
of GNU-based software tools, including 
compiler, assembler, debugger, and linker. 
MicroBlaze EDKs bought from Xilinx 
and its distribution partners also include 
development boards that support the 
Virtex™-E, Virtex-H, Spartan™-II, and 
Spartan-IE series of FPGAs. Table 2 sum- 


Soft Processor Architecture Bus 


MicroBlaze | 32-bit RISC | Harvard-style 


buses 


150 MHz 


32-bit 
instruction 
and data 

buses 


PicoBlaze 8-bit address 
and data buses 


Table 2 - Xilinx soft processors 


MIPS/Speed _Size 


100 D-MIPs 


35 MIPS 


116 MHz 


marizes the specifications for the two 
processor cores. 

Selected Xilinx FPGAs in the new IQ 
Solutions range have been qualified to 
-40°C/-40°F to 
IWe2@) 25/7 Eecimpctatmremtanee ance ate 


operate over the 


designed for use in automotive applica- 


tions, such as telematics systems. 


Conclusion 

Embedded in FPGA fabric, Xilinx soft 
processor cores such as MicroBlaze and 
PicoBlaze can eradicate processor obsoles- 
cence issues by providing a stable platform 
that is owned and configured by the 
designer. Used in combination with the 
new IQ Solutions extended temperature 
range FPGAs, they are ideal for such appli- 
cations as automotive telematics. 

Not only will you benefit from the flex- 
ibility, integration, and upgradeability 
offered by programmable logic, you can 
now take advantage of a processor tailored 
to your design needs that will not go obso- 
lete, and will ultimately save the time and 


money associated with costly redesign. 
For more information, visit these websites: 


MicroBlaze Information 


www.xilinx.com/microblaze/ 


PicoBlaze Information 


www,xilinx.com/picoblaze/ 


Automotive IQ Solutions 


www.xilinx.com/automotive/, & 


FPGA Support Support 
Embedded 
Development Kit 
(EDK) — soft processor 
core, peripherals, 
GNU-based software 
tools (compiler, 
assembler, debugger, 
and linker) 


225 CLBs Virtex 
Virtex-E 
Virtex-Il 
Virtex-ll Pro 
Spartan-Il 
Spartan-llE 


35 CLBs Virtex 
Spartan-Il 


Free-of-charge 
reference design 
and application 
note, assembler 
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Reduce Your RISC 
with a PicoBlaze 


Reterence Design 


NUM OU MO RSOTMIOCONKO MII 
CoolRunner-ll CPLD. It’s easy, fast, and tree. 
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by Jesse Jenkins, Applications Manager 
CPLD Business Unit 

Xilinx, Inc. 

jesse.jenkins@xilinx.com 


Would you like to design your own micro- 
controller, but don't want all the hassle of 
starting from scratch with either the 
architecture or the support software? 


Would you like to update older code from 
earlier microcontrollers to run faster and 


with the new memory standards? 


Would you like to take the same code to 
substantially lower power dissipation? 


Would you like to gain insight into 
the inner workings of a microcontroller 
with in-depth simulation as well as code 


creation support? 


If the answer is yes to any of these questions, 
read on about the new Xilinx PicoBlaze™ 
microcontroller reference design. It’s here 
now, it’s fast, and maybe most important of 
all, it’s free. This article details the CPLD 
version of the PicoBlaze reference design, 
including where to get the application code, 
examples, and the cross assembler. With that 
under your belt, you are ready to start — but 
first, a little more detail. 


PicoBlaze Explained 


The PicoBlaze soft microcontroller is an 8- 


bit design that supports an 8-bit data bus 
and 16-bit instruction bus. As you may 
have guessed, the PicoBlaze design is based 
on the RISC (reduced instruction set com- 
puter) “Harvard architecture” model with 
separate data and instruction ports. The 
PicoBlaze design is written in VHDL, and 
is intentionally documented so that the 
accompanying cross assembler directly 
tracks the architecture. 

The PicoBlaze version currently shipping 
supports 49 instructions that operate within 
any of several Xilinx CoolRunner™-II 
CPLDs. The speed will vary depending on 
exactly which instructions you wish to sup- 
port and which version of the architecture 
you choose. 


For instance, with the full instruction 
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set and all instructions held outside the 
CPLD, you can expect to achieve about 30 
MHz performance. By streamlining either 
the instruction set or the program, you can 
triple performance to 90 MHz. 

Naturally, the PicoBlaze microcontroller 
architecture takes advantage of the two 
key CoolRunner-II features — high-speed 


execution and low power consumption. 


Add or Delete Instructions 

Figure 1 shows the PicoBlaze base architec- 
ture, but don’t restrict your thinking to that 
architecture alone. Think of it as a starting 
point. You are free to either add or delete 
capability as you see fit. 

For instance, you can trim instructions 
from the instruction set by merely com- 
menting them out of the VHDL. If you 
wish, you can also remove them from the 
assembler, but that is not required. You 
can also add instructions, if you have 
some application that can take advantage 
of essential instructions beyond those 
currently supplied. 

It’s also possible to do both — cut some 
instructions and add some instructions. 
For instance, most programmers use about 
20 instructions in their day-to-day pro- 
gramming. Select the 20 you typically use, 
remove the rest, and then program. If you 
discover a bottlenecking “inner loop” that 
could benefit from a single instruction cus- 
tomized for that specific task, go ahead and 
write the VHDL that will do it at hardware 
speeds. Remember, the PicoBlaze micro- 
controller uses DualEDGE flip-flops with- 
in the 


processor to accomplish 


computation on both clock edges. 


A DSP Example 
To illustrate the ability of the PicoBlaze 
architecture to adapt, let’s look at an exam- 
ple from DSP. The code to bit-reverse a bus 
is a fundamental operation used in Fast 
Fourier Transforms. The value is then typi- 
cally driven out on the address lines as a 
critical step in the base algorithm. To do 
this in “standard” instructions would take 
multiple “mask and rotate” commands, 
creating a processing bottleneck. 

Figure 2 shows the basic operations in 


assembler-like steps to display register con- 
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Figure 1 - PicoBlaze architecture 


tents. The algorithm starts with a byte of 
data labeled A-H. This byte is first inter- 
nally swapped (four rotates), then succes- 
sively, inner bits are picked off with 
Boolean AND/OR into a target register 
that will build up the results, two bits at a 
time. One pass through this results in the 
final register with the original contents 
reversed. Depending on algorithm details, 
it can take approximately 12 to 18 instruc- 
tions. In this case, we dispense with adding 


the overhead of loop management with 


A|B|C|D|E|F|G]H Initial valu 
E|F)@|H|A|B|c|D Swap nibbles 
0/0/0]1/0/0/0)1 Mask 2 bi 
0/0/0|/H]}0]0]0]D. Orlnt Result 
D/E| F/G] H|A/ B/C Right Rot Swap 
0/0|H|0|0]0|D] 0 LeftRotint Resuit 
0/0/0/1]0/0/0)1 Mask 2 bi 
0/0|H|G]0|0)D|C Orin Result 
c|D/E;F|G]H]A/B Right Rot Swap 


0|H|G|o|0|D| c}o Left Rot Int. Result 


Interrupt 
OFeyalige) | 


CONSTANT DATA 


cela 
Address 
evel aline), 





PORT_ID 
READ_ STROBE 
WRITE_STROBE 


OUT_PORT 











ZERO & 
CARRY 
Flags 


Interrupt 
a 4 Flag 


Store 


Program 


ADDRESS 
Counter 


Program 
Flow 
Ofey al ige)| 
Program 
Counter 
Stack 


pointers and counters and unroll it. 

In Figure 3, the “flip” instruction is 
added to the VHDL, the design is recom- 
piled, and the processor is “rewired” to add 
in this key instruction. This method col- 
lapses many instructions down to some 
gate rewiring with the synthesis tools. Very 
many bit-level operations boil down to 
simply rewiring the CPU, and best of all, 
the synthesizer does the work. Many other 
examples exist. See the links at the end of 


this article. 


o|o|o|1|olo|o]1 Mask 2 bits 
0|H|G|F/0|D|c/B Or Int. Result 
B|}C|/D/E|F| Gl H| A Right Rot Swap 
H| G F|0 | D| Cc B| 0 Left Rot Int. Result 
0|0/0|1|0/0|0}1 Mask 2 bits 
H/G|F/E|D|c|B]A OR Int. Result 


Final result is bit-reversed 


Figure 2 - Bit-reversal code steps 
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A|B/c|D/E|F|G/# 


H|G)F|E)D)C)B)A 


Figure 3 - Recoded instruction 
for bit-reversal (“flip”) operation 


Processor Enhancement 

We just mentioned instruction set opti- 
mization, but it’s also possible to add 
functionality. Remember that many micro- 
controllers include on-board function 
blocks that have a payoff beyond the 
instruction set. For instance, many 8-bit 


microcontrollers include internal peripheral 


XC2¢32 


CoolRunner-ll 


[— Macocals | 32 
| wxvo | 8 


TSU (ns) 
roi) | 28 
Fsystem] (MHz) 333 270 


counters or timers, interrupt handlers, and 
DMA circuits. With PicoBlaze, just add the 
right set of peripheral capability within the 
chip, depending on the density of the 
CoolRunner-II CPLD chosen. Table 1 
shows the densities available in 
CoolRunner-I] CPLDs, and Table 2 gives 
some estimates of macrocell usage for vari- 
ous add-on functions. 

One very important thing to remember 
is that when choosing a function to add in, 
select just the functionality actually needed, 
so you will get the best usage from your 
choice. Bill Carter, one of the founders of 
Xilinx, frequently comments that most peo- 
ple don’ really want or need a UART (uni- 
versal asynchronous receiver/transmitter), 
but only an “RT.” That is, they select a func- 
tion that comes with 50 options, then only 


use two. They end up carrying along lots of 


XC2€64. — -XC2C128 = XC20256 = XC20384_—s XC20512 





Table 1 - CoolRunner-II macrocell capacities and pertinent data 
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Table 2 - Common functions and approximate 
CoolRunner-IT macrocell usage 
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unused circuitry that they pay for but never 
really use. Don’ fall into that trap. Select the 
functions you will need and get the cheap- 
est, fastest, lowest power solution possible. 
Note that most of the items listed in Table 2 
exist as separate reference designs and may 
be found on the Xilinx website. 


Performance Improvement 

Getting the most out of your design will be 
another step. A classic way to improve the 
design is to “tune” it. Observe its perform- 
ance behavior, identify where the processor 
is spending its time, discover what it is 
doing, and think through the best set of 
operations to improve. Then, implement a 
new version of the architecture and/or code 
and evaluate it again. 

One easy way to do that is with the 
CoolRunner-II Design Kit (see Figure 4). 
Many target designs easily fit onto the 256- 
macrocell XC2C256 that resides on that 
board. There is also a blank pinout site for 
adding a 64 macrocell XC2C64, with sig- 
nals already attached to the XC256. Simply 
construct a small hardware performance 
monitor that will time various code sections 
and report back the execution time. That 
way, by examining the behavior over address 
space and time, you can determine just how 
much time is spent doing the various tasks. 

Figure 5 shows a simple approach to 


doing this operation. With care, both 





Figure 4 - CoolRunner-II Design Kit 
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Figure 5 - PicoBlaze performance monitor 


CPLDs can communicate through a PC par- 
allel port via their JTAG boundary scan 
chains. Performance monitoring can help 
decide which aspects of the PicoBlaze design 
to perform in software and which to embody 
in hardware. One beautiful thing about 
building functions out of programmable 
logic is that different experiments to focus on 
specific performance targets are easily devel- 
oped — giving you highly tuned, fast designs. 

And you dont have to worry about power 
enhancement. CoolRunner-IT CPLDs are 
already the lowest power CPLDs available 
today, and PicoBlaze is a very competitive 


low power microcontroller. 


PicoBlaze Cross Assembler 

As mentioned before, the PicoBlaze cross 
assembler is so well documented that a 
direct correspondence between assembly 
code and VHDL in the PicoBlaze design 
file already exists. The translator is written 
in ANSI-C and is assembled on Microsoft 
assemblers. The cross assembler is highly 
transportable and supports multiple output 
file types. For instance, it produces a bina- 
ry output file ready to load into external 
EPROM in Intel hex format. 

The cross assembler also produces the 
essential modeling files for the VHDL sim- 
ulator. You can instantly analyze your pro- 
duced code with high-speed simulations to 
determine the functionality and effective- 
ness of the code you have implemented. 
Then download the code into the 
CoolRunner-II] Design Kit and see that 
they actually do work as expected. 
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Conclusions and Recommendations 

This introduction to designing PicoBlaze 
microcontrollers was written to stimulate 
your imagination to discover the fascinat- 
ing world of creating your own CPUs. 
Once you start, it can be addictive. You can 
easily alter the processor to be 16 or even 
32 bits wide — or even non-binary powers. 
Then you will discover which instructions 
burn through the macrocells and what the 
new speed limits will be. 


Do you need some instructions for 8 
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bits and others for 16? It’s up to you. 
Application areas where these kinds of 
processors make sense include industrial 
control, low-power portable DSP (brain- 
waves, EKG, medical), and cryptography. 
Did you know that most cryptographic 
operations are bit-level operations and 
almost never do floating point arithmetic? 

The PicoBlaze microcontroller reference 
design for CPLDs has been built, tested, 
and is now available over the Internet, free 


to the user. & 


Xilinx engineer Ken Chapman, who received additional support and encouragement 
from Henk van Kampen at Mediatronix BV, developed the original reference design. 
The CPLD version was created by Scott Lien, who also wrote the PicoBlaze cross 
assembler and the application note that is on the Xilinx website. The VHDL and 


cross assembler (source and executable) links are shown in xapp387 (see below). 


For more information, see the following: 


CoolRunner-II Design Kit: 


www.xilinx.com/products/cpldsolutions/demoboard.htm (purchasing details) 


CoolRunner-II Application Notes 


www.xilinx.com/xapp/xapp375.paf (timing model) 


www.xilinx.com/xapp/xapp376.paf (logic engine) 


www.xilinx.com/xapp/xapp3/77 paf (low-power design) 


www.xilinx.com/xapp/xapp3/8.pdf (advanced features) 


www.xilinx.com/xapp/xapp379.pdf (high-speed design) 


www.xilinx.com/xapp/xapp380.pdf (cross point switch) 


www.xilinx.com/xapp/xapp381.pdf (demo board) 


www.xilinx.com/xapp/xapp382.pdf (I/O characteristics) 


www.xilinx.com/xapp/xapp383.paf (single error correction, double error detection) 
www.xilinx.com/xapp/xapp384.pdf (DDR SDRAM interface) 
www.xilinx.com/xapp/xapp387 pdf (PicoBlaze microcontroller) 


www.xilinx.com/xapp/xapp388. pdf (on-the-fly reconfiguration) 


www.xilinx.com/xapp/xapp389.pdf (powering CoolRunner-II CPLDs) 


CoolRunner-II Data Sheets 


www.xilinx.com/bvdocs/publications!ds090. pdf (CoolRunner-II CPLD family data sheet) 
www.xilinx.com/bvdocs/publications/ds091. pdf (XC2C32 data sheet) 
www.xilinx.com/bvdocs/publications!ds092. pdf (XC2C64 data sheet) 
www.xilinx.com/bvdocs/publications!ds093. pdf (XC2C128 data sheet) 
www.xilinx.com/bvdocs/publications/ds094. pdf (XC2C256 data sheet) 
www.xilinx.com/bvdocs/publications!ds095. pdf (XC2C384 data sheet) 
www.xilinx.com/bvdocs/publications!ds096. pdf (XC2C'512 data sheet) 


CoolRunner-II White Papers 


www.xilinx.com/publications/products/cool2/wp_pdflwp165.pdf (chip scale packaging) 


www.xilinx.com/publications/whitepapers/wp_pdflwp 170. pdf (security) 


Xcell Journal 2] 


Helping To Make Your CPLD 
Designs Better... 


CoolRunner-Il Development Board 


www.nuhorizons.com/cr2 















www. nuhorizons.com/CPLD 
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Xilinx field programmable 
controllers combine the power 
of the 32-bit MicroBlaze 

sott processor cores with the 
versatility of Spartan-llE 
FPGAs to deliver the greatest 
number of 1/Os tor low-cost 
processing solutions. 


by Helen Yu 

Processor Solutions Marketing Manager 
Xilinx, Inc. 

helen.yu@xilinx.com 


Staying ahead of the competition is get- 
ting tougher everyday. Cost pressures, 
changing standards, and device obsoles- 
cence are just a few of the challenges. To 
maintain leadership, you need a low-cost, 
competitive processing solution that’s cus- 
tomizable throughout the entire design 
cycle and can quickly be brought into 
high-volume production. 

The Xilinx field programmable con- 
troller (FPC) solution allows you to cre- 
ate low-cost, customized processors with 


the peripherals, memory, and logic you 


want — all on a single, cost-optimized 
Spartan!™™-IIE FPGA. With the flexibili- 
ty to allow integration of other intellectual 
property (IP) cores on the FPGA fabric, 
the Spartan-IIE family presents an ideal 


embedded solution. 
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Traditional Design Issues 
The introduction of embedded soft 





processors has offered substantial 
benefits to the world of digital elec- 
tronics. The industry’ huge 
appetite for increasingly intelligent 
and sophisticated control systems 


has dictated the rapid advancement 


OOOOOOoOoO0000 


of processor technology. Perhaps 
most prevalent of all is the huge leap 
in demand for embedded micro- 
controllers and microprocessors. 

If you are using a traditional 


microcontroller unit (MCU) or 






microprocessor unit (MPU) in your 


application, selecting the proper 
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device is one of the most critical 
decisions that will ultimately deter- 
mine the success or failure of your 
design. Typically, you must address 
a number of issues, including: 
¢ Is the MCU or MPU affordable? 
Does it minimize the overall cost 
of the system while still fulfilling the 


design specification? 


¢ Does the unit have the required number 
of I/Os? ‘Too few can't do the job; too 


many can lead to excessive cost. 


¢ Does the device have all the required 
peripherals? Can you add your own? 
Does the unit include peripherals you 
don't need? 


e Are you paying for unneeded IP? 
e Will the MCU or MPU be available 


over the long term? Will the processor 


become obsolete? 


These questions are just some of the cri- 
teria to consider when choosing a tradi- 
tional MCU or MPU for your application. 
The Xilinx FPC solution, however, makes 


many of these concerns irrelevant. 


The FPC Solution 

Using the 32-bit MicroBlaze™ soft 
processor core in the Spartan-IIE FPGA, 
the FPC offers a true low-cost solution for 
real-time processor control. The combina- 
tion of a high-performance soft processor 
and a low-cost FPGA enables you to rap- 
idly develop programmable systems for 
cost-sensitive applications. 
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Figure I- Spartan-IIE FPGA block diagram 


The enhanced integration of embedded 
applications means fewer interface issues, 
and processor-based designs developed in 
the FPGA can more easily be updated 
without changing the PC board. 


Spartan-ITE FPGAs 

The Spartan-IIE 1.8V family of FPGAs 
achieves high-performance, low-cost opera- 
tion through advanced architecture and the 


latest semiconduc- 


SPARTANZIIE 


tor technology. The 
seven-member fam- 
ily (see “Spartan-IIE 
Family Grows” in 


this issue) offers 


wr 

densities ranging 

from 50,000 to 600,000 system gates. 

Spartan-IIE devices also provide system 
clock rates beyond 200 MHz. 

The Spartan-ITE FPGAs have a flexible, 
programmable architecture of configurable 
logic blocks (CLBs), surrounded by a 
perimeter of programmable input/output 
blocks (IOBs). There are four delay-locked 
loops (DLLs), one at each corner of the die. 
Two columns of block RAM lie on oppo- 
site sides of the die, between the CLBs and 
the IOB columns. The XC2S400E has four 
columns and the XC2S600E has six 
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columns of block RAM. A power- 
ful hierarchy of versatile routing 


channels interconnects these func- 





tional elements. Figure 1 shows a 
block diagram of a Spartan-IIE 
FPGA device. 

This flexible platform is an 


ideal base for implementing a 
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controller system. An embedded 
microcontroller takes the concept 
of integration one stage further by 
permitting you to embed the con- 
troller system into a small section 
of a programmable device. No 
longer does the microcontroller 


have to exist in a stand-alone 
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package; it can now be embedded 

deep within custom hardware. 
Spartan-ITE FPGAs are cus- 

tomized by loading configura- 





tion data into internal static 

memory cells while permitting 

unlimited reprogramming cycles 
to become a viable upgrade path for 
future product enhancements. Therefore, 
Spartan-IIE FPGAs are ideal for shorten- 
ing product development cycles while 
offering a cost-effective solution for high- 
volume production. 

In addition, the Spartan-HE family 
delivers a cost-effective platform with 
high numbers of I/Os to provide excellent 
I/O expansion (up to 514 user I/Os). 
Table 1 compares the number of I/Os in 
two traditional microcontrollers against 
the number of I/Os delivered by two 
Spartan-IIE FPGAs. 


MicroBlaze Soft Processor 
The MicroBlaze 32-bit RISC soft processor 
is a true 32-bit processor, supporting 32-bit 
— : bus widths. 
MicraBlaze %%¢ ois: 
RISC-based 
engine with a 32-bit LUT RAM-based reg- 


ister file with separate instructions for data 





and memory access. 

The MicroBlaze soft processor supports 
both on-chip block RAM and external 
memory. All peripherals use the same IBM 
CoreConnect™ OPB bus as the IBM 
PowerPC™ processor — which means the 


processor peripherals are hardware compat- 
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ible with the PowerPC processors on 
Virtex-II Pro™ FPGAs. 

The MicroBlaze embedded sys- 
tem, including the MicroBlaze core 
and selected processor peripherals, is 
shown in Figure 2. Several peripher- 
als are available to support the 
MicroBlaze processor, including 
memory controllers, VART, GPIO, 
I’C, 10/100 Ethernet MAC, and 
many more. 

The MicroBlaze core offers the 
flexibility and scalability of embed- 
ded processor programmable logic 
devices. The MicroBlaze processor 
requires less than half the logic 
resources yet offers more than twice 
the performance of competing soft 
processors, as measured by industry- 
standard Dhrystone-MIPS_ (D- 
MIPS) benchmarks. Delivering 49 
D-MIPS of performance at 75 
MHz, the MicroBlaze processor 
occupies only 1,050 logic cells in the 
Spartan-ITE FPGA. 


FPC Applications 

As shown in Figure 3, FPCs have 
significant applications in the tra- 
ditional 16- and 32-bit microcon- 
troller and microprocessor markets, 
which include automotive, indus- 
trial, and high-end consumer 


applications. 


Automotive 

The modern automobile is replete with 
microcontroller-based systems providing 
automated control for just about every 
conceivable part of the car. Braking sys- 
tems use microcontrollers to deliver 
advanced safety features, such as ABS and 
traction control. Windshield wipers are 
controlled to bring us timed interval 
wipes and even rain-sensitive automatic 
wiper activation. Heating controls for the 
vehicle interior monitor multiple zones 
within the passenger compartment, auto- 
matically adjusting the supply of air to 
maintain the desired temperature. Seats 
even remember the favored positions for 
different drivers of the car and readjust 


themselves automatically. 


Spring 2003 





SPARTAN-IIE FPGA 





= 





=> 


MicroBlaze Soft Processor 


MicroBlaze Embedded System 


Figure 2 - MicroBlaze embedded system 


Industrial 

No longer must large workforces be trained 
to monitor a specific area of a production 
plant. Today, a series of microcontroller- 
based monitoring modules, often linked to 
a central station, replaces these human 
counterparts. Microcontrollers work tire- 


lessly around the clock without lapses in 


Traditional Microcontrollers 
NEC V850E/1A2 (32-bit microcontroller) 


Motorola MC68HC912B32 (16-bit flash microcontroller) 


Spartan-lIE FPGAs # 1/0s 





concentration, requiring only the 
most minimal maintenance and 


supervision. 


Consumer 

Walk into any electronic store 
and you will find a host of prod- 
ucts that use some kind of micro- 
controller: MP3 players, video 
recorders, Web tablets, televi- 
sions, plasma displays, set-top 
boxes, refrigerators, washing 
machines, telephones, answering 
machines, ovens, toasters, print- 
ers, and scanners all offer added 
functionality through the use of a 


microprocessor. 


The Benefits of FPCs 

The Xilinx FPC solution com- 
bines the low-cost Spartan-IIE 
FPGA family with the compact, 
high-performance MicroBlaze 
32-bit RISC processor core. You 
also get complete embedded sys- 
tem tools (EST) support, includ- 
ing GNU 


debugger; hardware and software 


compiler and 


development tools for imple- 
mentation, simulation, and veri- 
fication; and more than 30 fully 
parameterizable processor IPs. 
Overall, FPCs offer a low-cost, 
high-performance, and easy-to-use solu- 
tion. Among the benefits of FPCs are: 

¢ Reduced costs — By integrating your 
design onto a single device, you not only 
save time and effort, you also reduce 
your overall costs. Spartan-ITE FPGAs 


are the lowest cost programmable 


#1/0s 
100-pin package 
80-pin package 


Advantages 


S-IIE XC2S600E 514-pin package 5x more I/0 than NEC, 6x more than Motorola 


S-IIE XC2S300E 





329-pin package 3x more 1/0 than NEC, 4x more than Motorola 


Table 1 - Comparison of Spartan-IIE devices and traditional microcontrollers 
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Figure 3 - FPC applications gallery 
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logic devices you can get. They 

also allow you to integrate costly 
board-level features such as DLLs, 
RAM, a variety of I/O translators — 
as well as MicroBlaze, the industry's 
fastest soft processor — into a single, 


compact, low-cost platform. 


More user I/Os — Spartan-IIE FPGA 
devices contain as many as 514 user 
I/Os, with more than 70% additional 
capacity, and the lowest cost per I/O 
than competing FPGAs in the same 
density ranges. This competitive 
advantage allows you to integrate 
more features and also shrink the 


form factor for each device. 


¢ Customization — You can create a 
customized controller and peripheral 
set to meet your exact and evolving 
design requirements. Unlike other 
solutions, with FPCs you are no 
longer locked into a rigid, pre-selected 
set of peripherals — or have to pay for 
unused function sets. In addition, 
Xilinx offers more than 30 processor 


IPs to choose from. 


¢ No obsolescence — Xilinx allows you to 
purchase the MicroBlaze soft processor 
core source code. This option guaran- 
tees product availability for any appli- 
cation you choose. You can also port 
the core across Xilinx product lines — 


even target an ASIC device. 


Conclusion 
FPCs from Xilinx deliver the highest I/O 
for low-cost processing solutions. They cre- 
ate a customized controller and peripheral 
sets to meet your exact — and evolving — 
design requirements. Freed from a fixed set 
of peripherals, you now have the power to 
custom tailor your peripheral set to meet 
your needs. In addition, the opportunity to 
purchase the MicroBlaze structural VHDL 
source code assures you of product avail- 
ability well into the future. FPCs reduce 
your overall design cost and inventory 
while bringing you the highest perform- 
ance from your logic devices. 

For more information on the FPC solu- 


tion, visit www.xilinx.com/fpcl. & 
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Memec Lead by Design Training 








Embedded Processor 
Solutions Workshop 


During this hands-on training course, learn how to use the new Embedded Development Kit software to architect processor-based Xilinx solutions. 
Easy-to-follow lab exercises provide practical experience with: 





e Defining your hardware system e Adding peripherals e Writing application code 

e Performing simulation e Implementing the design e Debugging (in-system) 
A MicroBlaze™ and Virtex-Il Pro™ PowerPC™ version of the course focuses on the details of the processor development flow. During the 
MicroBlaze class, you will experiment with the new Memec Design Spartan™-IIE LC Development Kit; likewise, the Virtex-II Pro PowerPC course 
allows you to explore the features of the Memec Design Virtex-Il Pro Development Kit. 


Exclusive to Attendees: Specially Bundled Kits at Reduced Pricing! 


With special kit pricing, you can continue your development efforts with the same hardware and software you used during the hands-on labs. 


4) .@e @F =] 
expe ; series 


- ACCELERATED LEARNING - 
Virtex-Il Pro PowerPC Development Kit Spartan-lIE MicroBlaze Development Kit 
Explore the power of an embedded PowerPC processor, The ideal starter kit for developing MicroBlaze-based applications. 
Rocket I/Os, and the most advanced FPGA architecture available. 


= 








e Spartan-lIE LC Development Board 


e Virtex-Il Pro Development Board e Programming Cable 

e Programming Cable e Power Supply 

e Serial and MGT Loop-back Cables e Xilinx EDK Software 

e Power Supply e Xilinx ISE BaseX Software 

e Xilinx EDK Software e Documentation and Reference Designs 


e Documentation and Reference Designs 


What Are You Waiting For? 
Classes take place in March and April, in locations throughout North America. 


Sign up now for the Embedded Processor Solutions Workshop nearest you. 


>. XILINX’ 





| 
in S i ghi Call 800.488.4133 ext.980 or register at www.insight.na.memec.com/embedded_workshops. 
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With the new Xilinx Embedded Design Kit, you can easily. develop programmable, embedded, 
hardware /sottware solutions that deliver optimum results — and you can.do it faster than ever. 


by Ravi Pragasam 

Marketing Manager, Embedded Processor Solutions 
Xilinx Inc. 

ravi.pragasam@xilinx.com 


The trouble with embedded design up to 
now has been that neither ASICs nor 
ASSPs — the traditional choices for embed- 
ded solutions — offer an ideal fit. In an 
ASIC flow, hardware (HW) limitation 
problems discovered late in the design cycle 
must be dealt with in the software (SW). 
This drawback is becoming increasingly 
critical now as physical geometries shrink, 
and as design rules and chip fabrication 
processes evolve. 

Likewise, ASSPs are also less than perfect 
because you must design your solution 
around standard elements, which often 
impose limitations. In addition, the non- 
standard elements of the ASSP usually don't 
add enough functionality to deliver a solu- 
tion that sets it apart from the competition. 
Typically, the resulting design has too much 
functionality here, and not enough there. 

Long design cycles involving extensive 


simulation, on the one hand, and narrow 
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market windows and changing industry 
standards, on the other, often necessitate 
compromises in performance. The result 
is often failure to meet original specifica- 
tion goals. What has been lacking is a fast, 
effective way to integrate HW and SW 
flows to produce a solution that brings out 
the full advantage of embedded design. 
The new Xilinx Embedded Dev- 
elopment Kit (EDK) bridges the gap 
between HW and SW flows by providing 
a single HW/SW design environment. 
Fusing the two early in the design flow 
enables you to deliver an embedded pro- 
grammable system that meets your 
design specifications on time — without 


compromising performance. 


A New Era of System Design 

The EDK allows you to define the hard- 
ware and software platforms of your system 
using powerful tools based on the Platform 
Specification Format (PSF). PSF is an open 
format that provides an abstraction layer 
between the HW and SW sections so they 
can be tightly integrated during the defini- 


tion and development process. This cou- 





pling ensures that the hardware platform 
you create will be one a software platform 
can match. 

The EDK also enables you to rapidly 
match the programming environment and 
software capability to available hardware 
resources. 

For the software designer, the greatest 
advantage afforded by PSF is access to an 
embedded tool that allows HW/SW co- 
design, and which also interfaces with 
industry standard software tools, such as 
RTOS support from Wind River Systems, 
Linux support from MontaVista Software, 
and popular open source embedded tools 


like GNU. 


PSF Benefits 
¢ The PSF overcomes many of the most 
common problems of the traditional 
“over-the-wall” approach to HW/SW 
interface definition and integration. 
With the PSF, you can give a hardware 
platform specification to a firmware or 
software engineer without having to 
wait until the hardware is available. 


The firmware and software developer 
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can then configure the SW platform 
with full knowledge of the HW plat- 
form — with which devices are 
attached, memory maps, register defi- 


nitions, and so on. 


Offering a single-environment approach 
for HW/SW platform configuration, 
PSF enables a rapid re-architecture of the 
embedded programmable systems design. 
This enables you to add to, or subtract 
from, HW resources with matching SW 


in the modified architecture easily. 


The unified environment, common bus 
structure, and common core libraries 
enable you to easily use a Xilinx 
MicroBlaze™ core, IBM PowerPC™ 
processor in a Virtex-II Pro™ FPGA, or 


both, in a single design. 


Multi-core, homogeneous (multiple 
MicroBlaze cores or multiple PowerPC 
processors), or heterogeneous (mixed 
PowerPC and MicroBlaze cores) in a 
single design (multiple bus masters), 


are also supported. 


The PSF is a one-of-a-kind specifica- 
tion format for both HW and SW. 
Because it is an open format, it is available 
for adoption by customers and third par- 
ties, enabling them to integrate custom 
peripherals, third-party IP cores, and 
tools. PSF provides a common tool chain 
for the HW/SW platform specification, 
generation, development, and debug, 
while supporting multimaster and mixed 
architecture designs. 

The Xilinx EDK and ISE 5.1i logic 
design tools provide you with the technol- 
ogy to help you achieve your performance 


requirements — faster and better than ever. 


EDK Contents 

The EDK includes the Xilinx Platform 
Studio (XPS), an all-encompassing design 
environment that permits you to define, 
configure, and generate a custom hardware 
and matching software/programming envi- 
ronment for either a stand-alone or RTOS- 
enabled programmable system. Hardware, 
software, and firmware developers who 
need to work in both domains can use the 


XPS integrated design environment. 
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e Embedded system tools 
- Xilinx Platform Studio 


- Tools for specifying the hardware and 
software platforms based on PSF 


- Xilinx microprocessor debug (XMD) 


- GNU tools for MicroBlaze and hard 
embedded PowerPC cores in Virtex-II 
Pro FPGAs (compiler and debugger) 


- Support for simulation tools 


- System generator for processors 


(beta version) 


- Board Support Package (BSP) 


generator 
e Interface and infrastructure IP cores 
- Arbiters 


- Memory controllers for external 


memory interfaces 


- More than 40 standard IP cores (such 
as UART, GPIO, and timer/counter) 


- Evaluation version of high-value 
cores (such as 10/100 EMAC, single 
channel HDLC controller, and serial 
ATA L2) 


¢ MicroBlaze 32-bit soft processor core 
¢ Reference designs and examples 


e Fvaluation versions of Wind River and 


ISE 5.11 software tools. 


Conclusion 

With key products, such as MicroBlaze 
32-bit soft processor cores and Virtex-II Pro 
FPGAs, Xilinx offers a complete solution 
that resolves many of the design challenges 
presented by more traditional tools and 
technology. With the Virtex-II Pro device, 
Xilinx has ushered in a new era of system 
design in which SW/HW flows blend to 
take advantage of programmability and 
high-performance features — such as the 
multi-gigabit serial transceivers available in 
the silicon — in a way that enables solution 
providers to get ahead in their markets 
without making performance compromises. 

Working within a common HW/SW 
tool environment, you can rapidly con- 
struct a custom processor system consisting 
of processor, peripheral cores, and inter- 
connect bus in a Xilinx FPGA. You can also 
integrate your own custom IP cores into 
the processor system. 

And now, with the EDK, Xilinx has 
enabled the process of HW/SW co-design 
that has long been a dream of embedded 
system designers everywhere. The EDK 
(Part No. DO-EDK) is available now. For 
more information and updates to the 
EDK, visit www.xilinx.com/edk/. 

For more information about Xilinx 
embedded processor solutions, visit 
Processor Central at www.xilinx.com/ 
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Using “sottware-compiled system desiany Aor programmable systems, We show'how: you 
can combine software and hardware desian:methodolagies — and development tools — 
trom system-level specitication to. direct implementation and runsime configuration 


by Chris Sullivan 

Director of Strategic Alliances 
Celoxica, Ltd. 
chris.sullivan@celoxica.com 


As FPGAs have developed from logic pro- 
totyping devices into fundamental system 
elements, there has been enthusiasm for the 
concept of using high-performance proces- 
sors closely coupled to or immersed inside 
the FPGA fabric for applications that 
require unrivalled levels of performance 
and flexibility. 

In this architecture, the microprocessor 
typically runs system applications while the 
FPGA manages computationally intensive 
tasks. Offloading processor-intensive tasks 
to hardware reduces the load on the proces- 
sor and delivers greater bandwidth. It can 
also remove bottlenecks by migrating algo- 
rithms to hardware. In short, FPGAs have 
evolved into fully programmable systems 
and fast co-processors, rather than just flex- 


ible relations of the ASIC. 
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Existing design examples that combine 
Xilinx FPGA solutions with development 
tools from Celoxica and Wind River 
Systems already provide unique and tangi- 
ble proof that this concept and design flow 
works. They form a core element of pro- 
grammable system co-design, delivering a 
quick, efficient, and verifiable route to 
device-optimized implementation. 

“Software-compiled system design” pro- 
vides the capability to drive partitioning, 
co-verification, and direct implementation 
from the system specification. Moreover, it 
allows engineers to jump-start their system 
and software application development 
before actual hardware is available, thereby 
enabling concurrent design, saving valuable 
development effort and delivering the best 
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time to market. Starting at the system level, 
verification becomes a whole-design life 
cycle activity, and by enabling system-level 
partitioning, you can realize a better quality 
of design (QoD) — right the first time, more 


of the time. 


A Design Example 

An early design example — developed by 
Celoxica, Wind River, and Xilinx — focused 
on the design methodology, tools, and run- 


time environments that can be applied to 
programmable systems. Specifically, we 
developed a triple-DES encryption and 
decryption engine to compare a program- 
mable system solution with an alternative 
software implementation. A compressed 


video stream formed the basis of test data. 
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Figure I - PPMC750 and RC1000 connected 
and functioning as a prototyping platform 
for programmable systems 


Hardware 

The selected hardware architecture was 
initially based around a discrete IBM 
PowerPC™ processor and a Xilinx 
Virtex™ FPGA — effectively a first- 
generation Virtex-II Pro™ prototyping 
platform (Figure 1). We used a PPMC750 
single-board computer from Wind River 
and Celoxica’s RC1000 — a Virtex-based 
PCI card with a Xilinx FPGA and 8 Mb of 
local memory (Figure 2). 

Subsequently, we deployed a newer 
reference platform using the PowerPC 
405GP processor (Figure 3). In addition 
to PCI and PMC (PCI mezzanine card) 
connectors, this platform also featured a 
custom connector that allowed an FPGA 
daughtercard to be plugged directly onto 
the processor peripheral bus, thus provid- 
ing even closer coupling, lower latency, 
and higher throughput. 

Various FPGA daughtercards can be 
used with this reference platform, such as 
the ADM-XRC from Alpha Data 
Systems, Xilinx Durango, or the Proteus 
card from Wind River. 

Wind River’s Proteus card is equipped 
with a Xilinx Virtex device and memory 
includes 4 MB on-board SSRAM. The 
FPGA PMC can interface with any stan- 
dard PMC slot (with an image containing 
a PCI soft core) or the microprocessor 
local bus on Wind River’s SBC405GP 
single board computer. There is a sub- 
stantial performance boost from direct 


processor bus connection, compared 


with PCI. 
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Figure 2 - Above, the RC1000 is a Celoxica PCI slot-based card, which has a 
secondary bus on-board for connecting other PMC based FPGA-cards as well as 
the existing FPGA on-board. Below the board is a schematic of the RC1000. 


The design platform is completed 
by a simple DAC interface, enabling 
the FPGA card to drive a video monitor 
or a flat-panel LCD for standalone 


demonstrations. 


Development Tools 

The 405GP processor runs Wind River's 
VxWorks™ real-time operating system 
(RTOS), together with hardware bring-up 
tools that allow close control of the boot 
cycle for the board during the time period 
before control is passed to the RTOS. 

The PAVE Framework API from 
Xilinx was used to program the FPGA 
with configuration files. 

Determination of the system partition 
and application content for the FPGA 
were developed using Celoxica’s Nexus co- 
design environment and DK Design Suite. 


Nexus and DK 

Nexus is a powerful co-design environment 
for programmable systems. It supports 
system partitioning, co-verification, and 


co-simulation. Nexus allows you to fully 


IBM PowerPC microprocessor 


Xilinx 
Virtex-Il FPGA 





Wind River SBC4056P Proteus FPGA Daughtercard 


Figure 3 - Virtex-II prototyping platform 
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explore the design space to identify 
optimal system partitioning. System 
functionality can be simulated 
between hardware and software using 
multiple languages such as C, C++, 
Handel-C, SystemC, and HDLs. 
These models can be used throughout 
the design for co-verification. Nexus 
communicates directly during simula- 
tion with popular, third-party, hard- 
ware RTL simulators and software ISS 
environments. 

Using DK, the resulting code may be 
debugged using a familiar integrated 
development environment (IDE), and 
applications are compiled direct to the 
FPGA fabric using device-optimized 
synthesis. VHDL and Verilog output 
is also supported for traditional RTL 


synthesis. 


Handel-C 

We selected the Handel-C language for 
hardware implementation, as it provides a 
common level of abstraction and a com- 
mon language base for both the hardware 
and software. The language has simple 
extensions to ANSI-C (Figure 4) that can 
be leveraged to quickly create applications 
that fully exploit the capabilities of a pro- 
grammable system, without compromising 
performance or area. 

As a fully synthesizable language, every- 
thing that can be described in Handel-C has 
translation to hardware (Figure 5). The code 
illustrates concepts and extensions, such as 
par, chan, synchronization, functions, 
pointers, structures, interfacing, and extern- 
ing pure C functions for simulation. 

With a simple timing model, each 
assignment in a program takes one clock 
cycle to execute, giving you full control 
over what is happening in the design at any 
point in time. Results are predictable and 
controllable, and the facility for complex 
sequential control flows means there are no 


state machines to design. 


Run-Time Environment 

Typically, the FPGA is connected to the 
microprocessor in a memory-mapped or 
programmed I/O fashion, but this creates 
the challenge of needing to develop and 
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Figure 4 - ANSI-C/Handel-C comparison 


redevelop individual communications pro- 
tocols and data-marshalling routines for 
each application. This problem is overcome 
by using DSM (Data Streaming Manager), 
a portable co-design API developed for 
hardware/software integration in program- 


mable systems. 


Data Streaming Manager 

DSM is a portable hardware/software co- 
design API that offers a simple and trans- 
parent interface for transferring multiple 
independent streams of data between hard- 
ware and software. DSM supports system 
partitioning and final implementation; it is 
both bus/interconnect and OS-independent; 
and for the developer, it simplifies the inte- 
gration between the hardware and software 
(Figure 6). 

As an example, the hardware function 
reads parameters from an input port and 
then writes the results to an output port. 
All the complexities of receiving com- 
mands over the PCI or bus, routing param- 
eters to the appropriate hardware function, 
and then routing the responses back to the 


Macro procedures 


Macro expressions 
Arbitrary width variables 
Interfaces 


RAM & ROM 


Handel-C 


fied by porting blocks of software to 


calling software thread are handled trans- 
parently by the hardware side of the 
DSM. 

On the software side, there are two 
main parts to the DSM: the control/set- 
up phase and then the specific usage of 
the custom hardware function. 
Essentially, information about the FPGA 
configuration and available functions are 
retrieved by reading a memory- 
mapped register. User-defined 
identifiers (called function 
addresses) are assigned to 

each available hardware 

function, and _ these 

function addresses are 

later used to commu- 

nicate between the 
application software and 
the functions implemented 
in hardware. 


Using this methodology, the 


optimal system partition can be identi- 


Handel-C, for hardware prototyping, test- 
ing, and verification. DSM’s portability 
means that multiple partitions can be rap- 
idly evaluated, tested, and verified with the 
software used as a testbench throughout. 
DSM also provides a functionally accu- 
rate simulation environment that allows 
ANSI-C programs and Handel-C applica- 
tions to interact using the DSM (Figure 7). 
The ANSI-C program is run as a native exe- 
cutable on the PC. The Handel-C applica- 
tion is run using the simulation and 
debugging capability of Celoxica’s co-design 
environment. A utility is provided through 
which the data passing between the applica- 
tions may be monitored to assist with 
debugging (Figure 8). All of the API func- 
tions are provided, allowing complete sys- 
tem development to begin — without the 
development platform being available. Once 
working, the application can be easily trans- 


ferred to the target platform for final testing. 


Triple DES Encryption 

Our design example was based around 
streaming of compressed and encrypted 
video data. The Autodesk FLI file format 
was used to compress the video, and an FLI 


player, developed by Celoxica, was imple- 
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mented in FPGA hardware connected to 
the processor via the PCI bus. To bench- 
mark the design, we loaded a cartoon ani- 
mation into the memory on the processor 


board. A triple DES algorithm described in 


C ran on the PowerPC microprocessor. 


define WIDTH 9 








typedef struct // complex number type 
{ 
Signed WIDTH re; 


Signed WIDTH im; 





} 


complex; 


set clock = external " ClockSource "; 
void main() 


{ 





chan complex cDataIn[ 2], cDataOut; 


while (1) 
par // parallel hardware 
{ 
DatalIO(cDataIn, &cDataOut) ; 
Transform(cDataiIn, &cDataOut) ; 














} 





// single synchronous clock domain 


The same C source code was ported to 
Handel-C, optimized in terms of control- 
ling parallelism and timing, and compiled 
to a gate-level design that was device- 
optimized for the target FPGA. 

A 64-bit key was used for the encryption, 


// parameterisable data widths 


(ClockSource) 


// communication channels 


// data input/output function 
// data transform function 


void Transform(chan complex *pcDataIn, chan <complex> *pcDataOut) 


{ 





complex DataInReg[ 2] , DataOutReg; 


par (i=0; i<2; itt) 


{ 





pcDataIn[ i] ? DataInReof i] ; Tale 
par 

{ // single cycle multiplication of two 
DataOutReg.re DataInReg[ 0] .re* Data 
DataOutReg.im DataInReg[ 0] .re* Data 














} 
*pcDataout ! 





DataOutReg; 





, 








// complex data registers 


// veplicted par{ } 


read complex numbers in parallel 


complex numbers 
InReg[ 1] .re - DataInReg/ 
InRegl 1] .im + DatalInReg[ 


DataInReg[ 1] .im; 
DataInReg 1] .re; 


On 
0 





// write complex number 





void DataIO(chan complex *pcDataIn, chan <complex> *pcDataOut) 


{ 





complex DataInReg[ 2] , DataOutReg; 








DataInput( &DataInReg[ 0] ); ih 
DataInput( &DataInRegf 1] ); Ve, 
Dae (i =O oo at) 

{ 








pcDataIn[ i] ! DatalInReg[ i] ; 
} 
*pcDataOut ? DataOutReg; oy 


DataOutput (&DataOutReg) ; (bap 





} 


#ifdef SIMULATE // Simulation Testbenches 











void DataInput (complex *pDataIn) 
{ 
long .Datal 2). 
scanf("SdS$d", &Datal 0], &Datal 1] ) 




















pDataIn->im 


} 





void DataOutput (complex *pDataOut) 
{ 

long Datal 214 
Datal 0] = adjs(pDataOut->re, 32) 
Data[ 1] = adjs(pDataOut->im, 32) 
Peli WiDataOuitethe-od coc \ nm, 











} 








#else // Hardware Implementations 








void DataInput (complex *pDataIn) 


{ 


< 


IMECELaCe- port..im (signed. WiLDTH 1m) 
pDataIn->re = DataInPort.in; 
pDataIn->im = DataInPort.in; 


Datal 

















; 





void DataOutput (complex *pDataOut) 
{ 
Signed WIDTH Data; 
Mier aCe port OUtE.() 
Data pDataOur-—>re; 
Data pDataOut->im; 








DataOutPort (signed WI 











} 
#endif 


Datal Ol], Datal li) 


// complex data registers 


data, Tipu seune ton 
data input function 
// xveplicted par{} 


// write complex numbers in parallel 


read complex result 
data output function 


pen) ANG D=Carunicit von 
pDataIn->re = adjs(Data[ 0], WIDTH); // type conversion from ANSI-C call 
adjs(Data[ 1], WIDTH); // type conversion from ANSI-C call 











// type conversion for ANS] 
// type conversion for ANSI 
// ANSI-C function 























DTH=bite. Apt, Ose 


[DrPH OutPort— Data)? 9/7 aWwiEDTH-biteroutput post 





Figure 5 - Handel-C code for single cycle multiplication of two complex numbers 
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which subsequently allowed correct decryp- 
tion of the video stream. Implementing 
three DES algorithms in sequence (triple 
DES encryption) provided further increases 
in this standard’s security. Three 64-bit keys 
were used for an encrypt/decrypt/encrypt 
cycle in a triple DES pass, and the same keys 
allowed  decrypt/encrypt/decrypt for 
decrypting the data. 

This was a robust test of performance. 
The algorithm was inherently sequential in 
software, but it could be heavily pipelined 
for a hardware implementation. 

To measure the performance improve- 
ment, we played a cartoon with each com- 
pressed frame being encrypted, decrypted, 
and displayed on a VGA monitor. Both 
hardware and software implementations 
were displayed together. They were trig- 
gered to start simultaneously, with the 
hardware version programmed to cycle 
continuously until the software implemen- 
tation finished. Processed data from the 
microprocessor was fed to the FPGA, 


which as well as performing DES encryp- 


Handel-C 


lm tlave (=) pi Oy 


a 


Software Software 





Figure 6 - DSM Data Streaming Manager 
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Figure 7 - DSM simulation environment 
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Figure 8 - DSM Sim Monitor for assisted debugging 


Encryption 


Decryption 


Elapsed time for 1 Mb of data 
Cryptography rate 





5558.8 ms 424.8 ms 5562.9 ms 424.7 ms 
1.51 Mbps 19.7 Mbps 1.51 Mbps 19.8 Mbps 


Table 1 - Performance statistics for triple DES encryption/decryption 


tion/decryption, was programmed to gen- 
erate VGA signals. Output from both the 
hardware and software implementations 
was merged to form a composite image on 


the monitor. 


Performance Comparison 

A test harness enabled triple DES per- 
formance to be benchmarked by stream- 
ing data into either the software or 
hardware encryption algorithm. 

Theoretical performance for the 
FPGA was calculated as follows: The 
triple DES implementation produced a 
64-bit word every 19 clock cycles, giving 
a data throughput of 85.6 Mbps for a 
device running at 25.4 MHz. 

Actual performance was profiled using 
WindView, a diagnostic tool from Wind 
River that enables visualization and 
analysis of performance and timing 
issues in embedded systems. It allowed 
triggers to be set at different points in the 
code, and then provided accurate 
timing information for each trigger 


event. Performance statistics are detailed 


in Table 1. 
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Performance 


Platform Clock Speed 


§B(405 





300 MH 


Table 2 - Performance comparison 
of hardware and software encryption 


This test scenario showed an FPGA 
throughput about 13 times faster using 
hardware than running software on a 
PowerPC microprocessor. Nevertheless, it 
was still about a quarter of the theoretical 
maximum rate. This indicated that the full 
benefits of placing core routines in hard- 
ware might be compromised by other sys- 
tem bottlenecks. Further analysis showed 
that there were overheads associated with 
offloading functionality into hardware. 
These overheads were associated with 
RAM access latency and/or bus speeds. 

We also calculated the performance of 
hardware and software encryption in the 
cartoon demonstration. Results demon- 
strated that the hardware performed 22 
times faster on a 15 times slower clock, as 
shown in Table 2. 


Following more detailed partitioning 
analysis, performance closer to the theoret- 
ical limits might be realized by removing 
code and functionality that are not directly 
associated with the triple DES algorithm 
(for example, the FLI decoder, frame 
buffer, and VGA driver). Better perform- 
ance would also be achievable by connect- 
ing the FPGA directly to the processor bus 
in a memory-mapped fashion rather than 


across the PCI bus. 


Conclusion 

The performance analysis results demon- 
strated significant improvements in overall 
system performance and quality of design. 
The results were achieved using a software- 
compiled system design methodology — 
specifically developed for programmable 
systems — that consistently delivered the 
fastest time to market (some 50% to 75% 
advantage in design time) without com- 
promising performance or area. 

For example, using the selected devel- 
opment tools and run-time environment, 
the FLI player took 10 person-days to 
triple-DES 
functionality. On the other hand, inte- 


implement, as did the 


grating these two blocks to produce the 
cartoon demonstration took just half a 
day. Moreover, you can very quickly 
explore the design space, experiment, and 


different 
trade-offs, and rapidly implement and 


analyze hardware/software 
prototype the system. 

Coupling Celoxica’s co-design tech- 
nology with high-performance profiling 
tools in the development tool chain 
enabled further performance boosts and 
Overall 


improvements in the quality of design 


time-to-market _ efficiencies. 
were realized by more informed and accu- 
rate partitioning decisions, better up- 


and by 


maximizing the speed gains of hardware 


front system verification, 
implementation while minimizing the 
negative impact of transferring data 
between the FPGA and microprocessor. 
The bottom line is that these system- 
level design qualities offer real and com- 
petitive advantages for designers of 
programmable systems who want to move 


to volume production. %& 
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NMI Electronics’ CPU deployment module 
uses Xilinx Spartan-ll and Virtex-l] FPGAs 


to achieve high levels of customization. 


by Kevin Heawood 

Vice President, Strategic Marketing 
Intrinsyc Software, Inc. 
kheawood@intrinsyc.com 


Typically, embedded systems are designed 
around a specific CPU architecture that 
comprises a 32-bit high-performance 
processor unit, volatile and non-volatile 
memory, and a set of peripherals and inter- 
faces specific to that system or to a particu- 
lar application. With on-chip clock speeds 
reaching 400 MHz to 700 MHz and 
board-level clock speeds as fast as 133 
MHz, system design has become increas- 
ingly time-consuming, complex, and risky. 

Using a single-board computer (SBC) 
or companion chips and interface logic to 
reduce risk can be difficult, however, as it is 
not always possible to find an SBC with the 
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correct peripheral mix and the interfaces 
that go with a particular CPU. Even if you 
do find the right SBC, the peripherals may 
be too expensive for your application, or 
otherwise inappropriate. 

NMI Electronics Ltd. has developed an 
SBC, or more specifically, a deployment 
module, that greatly simplifies system 
design. The module contains all the main 
components of a 32-bit CPU system, 
including volatile and non-volatile memo- 
ry, but it uses either a Xilinx Spartan™-I] 
or Virtex™-II FPGA to provide a com- 
pletely programmable peripheral set and 
an interface that can be specifically tailored 


to any application (Figure 1). 


Programmable Interface and Companion Chip 
Most SBCs have a predefined interface and 


set of functions. The interface is usually 
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based on an industry standard, such as 
PC/104, or on a combination of a standard 
interface and a custom interface defined by 
the board manufacturer. 

Peripheral functionality is normally pro- 
vided by standard chipsets and is deter- 
mined according to whatever the device or 
CPU manufacturer considers the most 
commonly requested functions. Beyond a 
few basics (for example, baud rates on 
UARITs or display resolutions), the mix of 
peripheral functions is neither flexible nor 
programmable, and so may be inconven- 
ient or inappropriate for your application. 

Using an FPGA effectively eliminates 
these restrictions. The FPGA can be placed 
onto the CPU local bus and closely cou- 
pled with the memory subsystem, thereby 
creating a highly programmable compan- 


ion and interface chip. 
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JTAG, Debug and 





Test Connector 


12 CPU 
Specific pins 


Figure I - wEngine general architecture 


Interfaces 
This FPGA system architecture allows you 
to implement the interface you require to 
the rest of the product hardware in a way 
that provides the optimum solution for your 
application. NMI supplies a number of 
standard interfaces, including the following: 

¢ PCI 

° ISA 

¢ PCMCIA. 

It is also possible to support the CPU 


local bus or a custom-designed interface. 


Peripherals 
Because almost every application has a 
unique set of requirements, the peripherals 
required for each specific system are as 
diverse as the applications themselves. 
Using the FPGA as the companion chip to 
fulfill these requirements, it is now relative- 
ly simple to mix and match a range of 
peripherals in a way that meets the exact 
needs of your application. 
Examples of peripherals that can be 

included in the FPGA are: 

e UARTs 

e SPI, I2C, and AC97 serial interfaces 

¢ Display controller (LCD or CRT) 

¢ Stepper motor controller 

¢ Camera frame grabber 

e PCI slave and host interfaces 


e PC/104 interface. 


Accelerators 

You can also place application-specific 
accelerators (co-processors) into the FPGA. 
These accelerators assist the CPU in the 
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specific functions. 


performance of 


Examples of such accelerators include: 
¢ 2D display assistance 
¢ Hardware cursor support 


e DSPs. 


Companion Chip Application Example 

Imagine you are working on an automobile 
navigation system that requires high- 
performance graphics, an interface to CAN 


bus, and two serial ports — one for interface 


CPU local 
bus (on the 
MicroEngine) 


CPU Interface 
Logic (CPU type 


set at synthesis 
time) 








Figure 2 - Example wEngine FPGA design 


to a mobile phone and one for diagnostics. 
It may not be possible to find a CPU and 
companion chip with that particular 
peripheral mix. 

By using an FPGA-based companion 
chip, however, you can implement a PCI 
host bridge as an interface to a high- 
performance standard graphics chip, plus a 
CAN controller, and two UARTs, all with- 
in the FPGA. You could even implement a 
DMA controller to feed the graphics chip 
and service the CAN controller and 
UARTs, which frees up the CPU to per- 


form more compute-intensive tasks. 


NMI MicroEngine 

Using the FPGA as the basis for the system 
interface and peripheral companion chip also 
makes it possible to isolate the CPU/memo- 
ry/FPGA subsystem and mount this onto a 
small printed circuit board. 

In fact, this is what we at NMI have 
done with our MicroEngine (Engine). 
The pEngine is a small form factor deploy- 
ment module that contains all the key ele- 
ments of a high-performance, processor- 


(CPU, flash memory, 


based system 


External 
Interfaces 
(from the 

MicroEngine) 


PC/XT Interface bus 
to PC/104 


NMIISA Bus 


Interface IP Core 









External interrupt 
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SDRAM), but which uses a Xilinx Spartan- 
II or Virtex-II FPGA to provide the system 
interface and peripheral functions. This 
arrangement provides a totally flexible core 
module, and it enables you to include pre- 
cisely the peripherals and interfaces you 
need for your system. It also means that 
you can use the same basic board in a wide 
range of equipment. 

In addition, the pEngine addresses the 
conceptually simple, yet practically more 
difficult, problem of designing microproces- 
sor systems with high-speed external clocks 
and buses. The pEngine is, in its 
own right, a self-contained, pre- 
tested, high-performance micro- 
processor subsystem. All it needs 
to “run” is power. 

In other words, all you need 
to implement your application is 
a baseboard containing a power 
supply and the specific interface 
logic to suit your application. In 
many cases, the baseboard can be 
relatively straightforward, and 
use lower technology design and 
less stringent manufacturing 
rules than the high-speed Engine design. 

The connection between the pEngine 
and the baseboard is made via an industry- 
standard, 144-pin, SODIMM connector 
that carries both power and logic signals. 
Eighty-eight pins of the interface are con- 
nected to the FPGA and are completely 
user-programmable. Twelve CPU-specific 
pins carry such dedicated functions as 
serial ports, ADCs, DACs, or USB, 
depending on the CPU deployed on the 
wEngine (Figure 2). 

The image for the FPGA is held in the 
yEngine’s flash memory and is completely 
reprogrammable. You can even place more 
than one FPGA image on a uEngine, 
enabling it to support multiple baseboards. 
It does this by means of a mechanism that 
identifies the type of baseboard it is plugged 
into at power-up and automatically loads 
the correct FPGA for the application. 

In addition, the ability to isolate the 
CPU from the baseboard allows you to 
plug different CPU-based pEngines into 
the same baseboard. This is one of the 


pEngine architecture’s greatest advan- 
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tages. It enables upgrades of CPUs as well 
as the use of CPUs of varying perform- 
ance levels, such as an application that 
requires modest graphics performance in 
an entry-level product and high perform- 
ance in another, high-end product. For 
instance, the entry-level product might be 
based on a Hitachi SH3 (without FPU) 
100 MHz pEngine (Figure 3) — and the 
high-end product might be based on a 
Hitachi SH4 (with FPU) 200 MHz 
wEngine (Figure 4). Both units would use 


the same baseboard. 


PLU LOLM LIE LILIRL ELA LiL Ley 


= =. i) 
= 


| 





Figure 3 - Hitachi SH3 pEngine with Virtex XCV100 
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Figure 4 - Virtex XC2V1000 implemented 
on Hitachi SH4 pEngine 


FPGA Intellectual Property 
FPGA IP cores for the Engine are avail- 
able from many sources: 
¢ NMI provides a wide range of proven 
IP (for example, PCI host bridge, dis- 
play controllers, UART, frame grabber, 
2D graphics accelerator). 


Lis Libieh Livi Werbe taht 


¢ Xilinx LogiCORE™ IP 
¢ Third-party IP 

¢ Your own IP 

¢ Custom-developed IP. 

These elements can be freely mixed in 
the pEngine to produce the unique func- 
tionality required for any application. The 
NMI deployment module has many fea- 
tures that simplify integration into the 
final system. For instance, because most 
systems using high-performance CPUs are 
running an embedded operating system, 
such as Windows® CE.NET, we have pro- 
vided Windows CE_ software 
drivers for all of our FPGA IP on 
the full range of pEngines. 

What’s more, you can popu- 
late a pEngine with various 
FPGA densities: 50K to 200K 
gates on Spartan-II FPGAs, 50K 
to 300K gates on Virtex-E 
FPGAs, and 250K to 1M gates 
on Virtex-IT FPGAs. This vari- 
able gate population makes the 
wEngine the most cost-effective 
solution for almost any applica- 
tion, interface, peripheral, or 
accelerator mix. 

Lastly, to ease portability from one 
FPGA device to another, our FPGA designs 


use only high-level description languages. 


Conclusion 
NMI developed the pEngine deployment 
module through imaginative use of FPGAs 
in CPU-based systems, creating a high- 
performance module that provides 
extraordinary hardware flexibility and 
upgradeability. The availability of FPGA 
IP and reference designs facilitates rapid 
and low-risk development of new products 
and applications, allowing companies to 
focus on adding value, rather than having 
to reinvent the core technology. 

NMI offers several development plat- 


forms, enabling you to easily evaluate the 


wEngine and its associated IP. 


Editors note: Since this article was written, 
NMI Electronics was purchased by Intrinsyc 
Software Inc. For more information on the inno- 
vative use of Xilinx FPGAs in pEngines, visit 


www. intrinsyc. com/products/microengine/. > 
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by Jane S. Donaldson 

President 

Annapolis Micro Systems, Inc. 
U S jdonald@annapmicro.com 


DSP developers and their customers know 
FPGA-based processing outperforms conven- 


tional processors on a board-for-board com- 


parison, resulting in significant improvements 
in processing speed, size, weight, power, and 
costs. Your FPGA design can be a customized 


parallel processing chip, specifically crafted for 
a particular application, accelerating the 


application to run in hardware and at hard- 


= —_ 
a ware speeds far faster than could be achieved 
°. with software on a generic processor. 





¢ Process data in real time, on site, saving 
all the time and money involved in data 


collection and off-site processing. 


Modify the processing by simply 
reconfiguring the chip (by download- 
ing a different FPGA file) to fix bugs, 
to adapt to a new set of interface 
requirements, or to modify the pro- 
cessing in response to application 


input data or processed results. 


Deliver new applications in place, with 
no human on-site intervention, by any 
means of file transfer, including 
Internet, internal network, hard drive 


11 ait FE = storage, smart card, or wireless modem. 
ery To deploy your application quickly to 
meet customer demands, you need commer- 


cially available hardware with the latest Xilinx 


OLOLLOLOLIN 


FPGAs — plenty of gates, ample memory, and 
fast standard I/O options like Fibre Channel 
2 and 1.5 GHz A-to-D input. 

You need a quick and easy way to devel- 
op, modify, and test your applications. 
With VHDL, Verilog, or schematics, even 
the most experienced ASIC designers need 


many months to develop applications using 


upwards of 40 million gates for a single 
VME or PCI slot. 
You can jumpstart your DSP design 


process — saving time and money — by 
using the eighth-generation, commercial 
off-the-shelf (COTS), general purpose 
Xilinx Virtex™-II FPGA-based hardware 


from Annapolis Micro Systems, Inc. You 
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can develop at the application level with 
the easy-to-learn CoreFire™ FPGA 
Design Enabler. It’s loaded with high- 
performance IP modules, created for your 


use by FPGA application design experts. 


Meet the Demand for Real-Time 

DSP Applications 

Managing digital signal processing data in 
real time for applications like radar and 
image processing is very demanding. You 


need: 
¢ High-speed real-time processing 
Very fast data rates 


¢ A combination of complex and real 


data types 


e Integer and floating point data repre- 


sentations and computation 


¢ Variable and changing data path sizes 


When you implement your digital sig- 
nal processing application in a Xilinx 
Virtex-II FPGA, you build a customized, 
parallel-processing design that outper- 
forms both general-purpose processors and 
digital signal processing chips. Some of the 
Virtex-II features that enable this very high 


performance are: 


¢ Chip performance in excess of 


300 MHz 


¢ Multiple on-chip memory banks for 


vector-based processing 
¢ High ratio of memory to logic 
e Fast embedded multipliers 


¢ 16 pre-engineered clock domains 
to support the multiple frequencies 
and multiple-phase requirements of 


complex system design. 


Radar Signal Processor 
16 Channel Channelizer 


1200 MSPS 
@ 8 bits/ 
80 Sample 





Two Parallel 

8 Bit Inputs at 

Sample Rate 16 
Over 2 Data 


ADC 




















To illustrate the requirements of DSP 
applications, we chose a radar signal 
processor that uses a 16-channel channel- 
izer with a polyphase FIR filter and FFTs to 
divide the incoming data into multiple fre- 
quency channels for real-time processing. 
Refer to Figure 1 to see the data flow and 
processing required by this application. 

The data comes into the system at 1200 
MegaSamples/second with 8 bits per sam- 
ple. This stream is broken into 16 data 
streams and processed with polyphase FIR 
filters and FFTs. The resulting data is again 
split into 16 channels, the nine most inter- 
esting of which are chosen for further pro- 
cessing. Channels 1-3 are sent into the first 
radar signal processing module, 4-6 are 
sent into the second radar signal processing 
module, and 7-9 are sent into the third 
radar signal processing module, all at 300 


MB/s per channel. 


WILDSTAR II for VME 


Raw Data - 1200 MB/s 


2700 MB/s 


Complex Data 
@ 300 MB/s 
Per Each of 9 Channels 


Channelizer 


Polyphase 
Fin and ERT PE 











DE ream XCVE2000-8 
XG2V3000-5 
1.5 GHz A/D I/O Card 
405 SDRAM 
150 MB/ 
Dual FC2 || 
Controller — PE 
i ie Buffer XC2V4000-5 
for Xfer 
150 MB/s to Disk 600 MB/s 
Write Dual FC2 = 
Controller 
By MBs SDRAM 
rite . 128 MB 
Fibre Channel 2 I/O Card 


Figure 1 - Channelizer block diagram 
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900 MB/s 


SDRAM 
—| 64 MB 
Radar Signal 


Processing voice 


—— MB/s 


SDRAM 
= 64 MB 
Radar Signal 


Processing Pee 


200 MB/s 


Radar Signal 
Processing PET 
XC2V6000-5 
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The pre-channelizer raw data, at 1200 
MB/s, is divided into three streams of 400 
MB/s each. Each stream is stored in its own 
SDRAM block. The appropriate raw data 
is folded back into the radar signal process- 
ing with the channelizer-processed data. 
The radar signal processors perform filters, 
FFTs, and other DSP functions on the 
data. The final result is sent to buffer mem- 


ory, and then out to disk at 600 MB/s. 


Use COTS Hardware from Annapolis 

for Fast Deployment 

On the right side of Figure 2 is the 
Annapolis WILDSTAR™ II VME board. 
This board is available with one, two, or 
three Virtex-II 6000 or 8000 FPGAs, with 
up to 72 MB of DDR2 SRAM in 18 banks, 
up to 384 MB of DDR SDRAM in three 
banks, and programmable flash memory for 
storing FPGA files for fast reconfiguration. 


On the top left side of Figure 2 is the 


Annapolis 1.5 GHz A/D I/O card, which, 
for this application, plugs into the top slot 
of the WILDSTAR II card. This board 
comes with a MAX 104 or MAX 108 8-bit 
A/D converter, one Virtex-II 1000 or 
3000, one Virtex-E 1000 or 2000, with up 
to 2 MB of DDR2 SRAM accessible by the 
Virtex-II bridge PE and up to 16 MB of 
ZBT SRAM in four banks accessible by the 
Virtex-E PE. 

On the bottom left side of Figure 2 is 
the Annapolis Fibre Channel 2 I/O card, 
which, for this application, plugs into the 
bottom slot of the WILDSTAR II card. 
This board has four full duplex Fibre 


Channel 2 I/O channels, with peak rates of ' 


200 MB each way per channel. The board 
comes with two QLogic I[SP2312s, a 
Virtex-II 4000, 264 MB of DDR SDRAM 
in four banks, and an IBM PowerPC™ 
405 running Linux. 

Figure 3 shows the boards connected 
together and ready to fit into one slot in the 
VME chassis. The ADC on the 1.5 GHz 
A/D I/O card performs the analog input 
and A/D conversion. The Virtex-II and 
Virtex-E FPGAs on the 1.5 GHz A/D I/O 
card create parallel data streams and per- 
form the channelizer function, using 
polyphase filters, FFTs, and other DSP 


functions, as well as data reduction. 
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Figure 2 - WILDSTAR II 
for VME with 1.5 GHz A/D 
and Fibre Channel 2 1/0 cards 





Figure 3 - WILDSTAR II for VME 
with I/O cards ready to insert in chassis 


The channelized data and raw data are 
both split into three paths in the Virtex-I] 
PEO on the WILDSTAR II card. Each 
Virtex-II PE on the WILDSTAR performs 
radar signal processing functions. The 
Virtex-II PE1 on the WILDSTAR II card 
gathers and processes the results for output 
to the Fibre Channel 2 I/O card. 

The Virtex-II FPGA on the Fibre 
Channel 2 I/O card accepts the data from 





the WILDSTAR II card, buffers it, and 
sends it out to disk via the four Fibre 
Channel 2 channels with the help of the 
QLogic and PowerPC chips. 

Table 1 is a comparison of the system 
data transfer speeds provided by this 
Annapolis system to the data transfer 
speeds required by this channelizer applica- 
tion. You can see that the system easily 
meets the throughput requirements for the 
channelizer application. 

These WILDSTAR II and I/O boards 
are the eighth-generation of Xilinx 
FPGA-based, high-performance process- 
ing boards produced by Annapolis Micro 
Systems. Annapolis continues to push the 


high-performance envelope, using latest- 


standard Xilinx FPGAs. 


You Can Build Your Application Quickly 
and Easily with CoreFire 
You can see that the channelizer applica- 
tion fits on the chosen WILDSTAR II sys- 
tem, so acquiring readily available 
hardware for your application will be easy. 
The next step is to figure out how you 
will develop the host software and FPGA 
implementations for your application. 
Remember, this project stretches across 
three different printed circuit cards and six 


different FPGAs. 
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Data Path 


Input to ADC Speed 

ADC to PE Speed 

A/D 1/0 to PEO on WS II 
WS II PE to its SDRAM 
PEO to PEI 

PEO to PE2 

WS Il PE2 to FC II 1/0 


Channelizer Data Transfer 
Speed Requirements 


1200 MegaSamples/s 


System Data Transfer Speed 





Table 1 - System data transfer speeds versus channelizer requirements 


The classic VHDL methodology for 
implementing applications on FPGAs is 
difficult, and requires expert knowledge 
and countless months of painstaking work. 
You cannot wait months to deploy your 
product. You need a tool that will allow 
you to deploy your project within weeks, 
not months or years. You must be able to 
develop new application files rapidly and 
easily, as well as accommodate specification 
changes, functional additions, and algo- 
rithm development. 

Using the CoreFire FPGA Design Suite 
from Annapolis, you can implement each 
of your algorithms in as little as a few 
hours. Use the standard WILDSTAR II C 
or Java API to write your host program. 
The CoreFire board support packages 
handle all the I/O, memory, and FPGA 
interfaces seamlessly, providing excellent 
performance. Refer to the CoreFire screen 
display in Figure 4. 

CoreFire is a graphical user interface 
FPGA application development tool that 
allows you to build your application very 
quickly by dragging and dropping library 
elements onto the design window. Choose 
from more than 400 expertly crafted mod- 
ules. Modify your input and output types, 
numbers of bits, and other variables by 
changing module parameters with pull- 
down menus. Move modules around on 
the screen and reconnect with a flick of 
the mouse. 

The modules automatically provide cor- 
rect timing and clock control. Insert debug 
modules to report actual hardware values 


for in-the-loop debugging. Hit the Build 
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Button to check for errors and sizes and to 
build an encrypted EDIF file. Use the 
Xilinx ISE tool to place-and-route each 
FPGA design. 

Modify and use the jar file created by 
the CoreFire build to load your new file 
into your WILDSTAR II and I/O card 
hardware. Use the CoreFire debugger to 
view and modify register and memory con- 
tents in the FPGA, and to step through the 
data flow of your design running in the real 
physical hardware. 

Armed with your debug results, you will 
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Figure 4 - CoreFire screen display 
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find it very easy to use the CoreFire design 
window to modify and rebuild your FPGA 
design until you are satisfied with the 
results. Use the CoreFire program to build 
and debug each of your FPGA designs, and 
then use the jar file and the WILDSTAR I] 
API to develop your overall host program. 


Conclusion 
It is easy to push the DSP performance 
with Virtex-II FPGAs and 


Annapolis Micro Systems boards. Some of 


limits 


our customers have gone from initial 
inquiry to first deployment in as short a 
time as two months. 

When you buy high-performance, world- 
class, Virtex-II-based off-the-shelf hardware 
from Annapolis, it is easy to build and mod- 
ify applications. You have more time to fine- 
tune your algorithms. You can get prototypes 
up and running sooner, so you have more 
time to test market your product. 

Final development is just as easy. You 
will be in the market far ahead of your 
competition, saving time and money. To 
learn more, contact Annapolis Micro 
Systems, Inc., at 410-841-2514, or visit 


our website at www. annapmicro. com. & 
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SE 5.2i Further Reduces 
Your Design Costs =::s*<-" 


provide a low-cost, low-risk, and 
high-performance logic solution. 

by Mark Goosman / Lee Hansen 

Product Marketing Managers 


Xilinx, Inc. 
mark.goosman@xilinx.com, lee.hansen@xilinx.com 


Reducing project costs isnt new to most 
designers, but in tight economic times the 
pressure to bring project costs down 
becomes much more important. In his Xcel/ 
Journal Winter 2002 article titled, “When 
Total Cost Management Counts, Xilinx 
PLDs Pay Off” (www.xilinx.com/publica- 
tions/products/cool2/xc_tcm43.htm), — Eric 
Thacker described how programmable logic 
devices (PLDs) offer significant benefits in 
dynamic, rapidly changing markets. 

With our ISE development systems and 
development options, Xilinx not only sup- 
ports the benefits of PLDs but also offers 
additional cost savings. ISE 5.21, the latest 
release of our design software, delivers a 
number of productivity technologies that 
shorten logic design flow, optimize design 
results, shorten implementation and verifi- 
cation cycles, and provide interactive design 
assistance. At the same time, ISE 5.2i 
enables you to realize even faster design per- 
formance. The end result to you is cost sav- 
ings across your entire project. 

The shorter design cycles and time-to- 
market advantages of FPGAs and CPLDs 
mean that you need less engineering 
resources. This allows you to make the best 
use of your staff when difficult economic 
conditions restrict your ability to hire more 
engineers. Our fast, efficient, and highly 
productive ISE software tools help you get 
the job done in less time, and they make 


each engineer more productive. 
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Free ISE WebPACK 
The ISE WebPACK™ design suite is the 
ideal Web-downloadable desktop solu- 


tion. It offers a complete development 


environment with modules from ABEL 
and HDL synthesis to device fitting and 
JTAG programming. ISE WebPACK 
tools are a subset of our award-winning 
ISE Foundation™ design tools, provid- 
ing instant access to the ISE tools at no 
cost. By providing a design solution that 
is always up-to-date, with error-free 
downloading and single file installation, 
Xilinx has created a solution that allows 
productivity. Because ISE 
WebPACK development tools are avail- 


instant 


able for download from the Xilinx web- 
site at www.xilinx.com/ise/webpack5, 
you can get started immediately on 
designs for leading Xilinx CPLDs and 
mid-density FPGAs. 

This Web-downloadable design solu- 
tion reduces your design costs by includ- 
ing all the tools you need to complete 


your design. 


ISE WebPACK includes: 


¢ ModelSim Xilinx Edition (MXE-IT) 
Starter Version 
ModelSim® XE is a complete HDL 
simulation environment that has been 
optimized for programmable logic 
design, enabling you to quickly verify 
source code and functional and timing 


models of your design. 


HDL Bencher 

Within the ISE WebPACK toolset, the 
HDL Bencher™ test bench generator 
automatically imports the current HDL 
design file and creates an editable stim- 


ulus waveform by default. 


StateCAD 

The StateCAD® FSM wizard automates 
the state machine design process. You 
can specify complex state machines to 
quickly meet tough product require- 
ments. [he state machines can then be 
automatically translated to an HDL for- 


mat you can include in your design flow. 


ChipViewer 

Chip Viewer is a pre- and post-fit graph- 
ical utility to assign or view pin place- 
ment and implemented logic for all 
Xilinx CPLD devices. This removes the 
risks associated with changes late in the 


design process. 


° XPower 
XPower is a graphical power-analysis 
tool. Total device power, power per-net, 
fitted, routed, partially routed, or 
unrouted designs can be easily analyzed. 
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Optimized Design Performance 

and Device Utilization 

Xilinx ISE design tools have raised the 
industry standard for both design perform- 
ance and device utilization. Through patent- 
ed implementation algorithms, ISE allows 
you to achieve the fastest possible design 
performance. Compared with competitive 
solutions, designs can achieve better than 
15% higher performance. 

This performance edge means you can 
potentially target a lower cost device — lever- 
aging faster performance from the software. 
Thus, you can hit your timing goals earlier, 
spending less time in the design flow. 

For example, based on benchmark data, 
you can achieve 20% to 30% better per- 
formance in Virtex-II Pro™ designs using 
ISE than you can get from an offering from 
the leading competitor. In many cases, you 
can target your design to a slower speed 
grade device and still achieve targeted 
design performance. 

ISE also reduces project costs by packing 
more logic into Virtex™-II devices, letting 
you fit your design in the smallest possible 
device. Advanced FPGAs are not solely 
made of look-up tables and flip-flops any- 
more. Today’s logic fabrics are best described 
as “feature rich.” This trend requires sophis- 
ticated algorithms in both synthesis and 
implementation tools, providing optimal 
performance and logic utilization by leverag- 
ing new hardware features. 

Xilinx ISE development tools separate 
unrelated functions and assign them to dif- 
ferent clusters (called a slice) on the fabric. 
This avoids conflicting placement constraint 
and guarantees optimal performance. As the 
device gets full, powerful algorithms pack 
unrelated logic into common clusters. This 
gradual process ensures that the device is uti- 
lized at its best, with minimal impact to 
design performance. 

With competing FPGA solutions devel- 
opment tools, the packing of logic requires a 
special option. With this option turned on, 
packing is limited, because an unrelated 
LUT using its 4-inputs and a flip-flop can- 
not be merged together in any logic element 
—and the limited packing comes at a cost to 
design performance. Virtex-II logic utiliza- 
tion with ISE comes out 15% better than 
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the nearest competitive offering. 

In the Virtex-II fabric, the LUT and flip- 
flop can be used independently, without 
restrictions. In Stratix devices, a LUT can- 
not be used with its flip-flop in all circum- 
stances, because one input pin of the LUT 
is shared with the path that has direct access 
to the flip-flop. By default, when the flip- 
flop is not fed by any logic, the LUT in that 
LE is unavailable to the rest of the design. 
As a remedy, Quartus II tools provide a reg- 


Virtex-Il (2 true independent paths) 






ChipScope Pro tool allows you to monitor — 
in real time — any signal in the FPGA. This 
includes the IBM® PowerPC™ 405 periph- 
eral bus in the advanced Virtex-II Pro 
FPGA. Design signals are captured and 
brought to the outside world through the 
FPGA JTAG programming port. This min- 
imizes the amount of dedicated FPGA space 
and I/O pins required — as opposed to using 
more traditional ASIC and competing 


FPGA debug methodologies. 


Stratix (shared pin on the LUT) 





Figure I - Connectivity LUT to flip-flop in Virtex-II and Stratix devices 


ister packing option (off by default) to 
enable the packing of LUTs along with the 
flip-flop. This still does not allow LUTs 
using 4-inputs to be packed, because the 
connectivity restriction is still present. 
Figure 1 shows LUT to flip-flop connectiv- 
ity in both Virtex-II and Stratix devices. 


Advanced Technology Streamlines 

the Design Flow 

ISE is also packed with advanced software 
technology designed to accelerate the more 
time-consuming parts of the design and 
debug logic flow. 

Incremental Design is a technology 
included in ISE that shortens design re- 
compile times. By locking performance for 
areas of the design that don’t need to change, 
Incremental Design lets you perform re- 
synthesis and re-place-and-route on only 
those pieces of the design that have to 
change. This reduction in time adds up fast 
in the crucial verification cycle, where debug 
changes are common. 

The Xilinx ChipScope™ Pro integrated 
logic analyzer also delivers added productiv- 
ity to the verification cycle. Through small, 


easy-to-place software debug cores, the 


Additionally, signal monitor points can 
be changed through the ISE FPGA editor 
without having to re-compile the design, 
saving even more debug time. The 
ChipScope Pro analyzer cuts verification 
times dramatically, even when the device is 
on the board — or in the field. 


Conclusion 

For logic design, the true cost of the proj- 
ect includes much more than just device 
cost. Factors like development cost, project 
timelines, access to development tools, 
designer efficiency, ability to achieve 
device performance goals, and verification 
costs can have a big impact on the overall 
project cost. 

Xilinx allows you to meet — or beat — 
your project budget through free ISE 
WebPACK development tools, other ISE 
configurations, a complete design envi- 
ronment, ISE’s powerful implementation 
tools, robust verification technology, and 
more. As you evaluate various logic 
design solutions, look at the total costs 
associated with design tools and designer 
resources in addition to the cost of 


the device. & 
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Prototype Al Xilinx Devices 


In-System or Standalone — 
with MultiPRO Desktop Too! 


Now you can program/contigure Xilinx devices trom your b Niche Bodo 
arketing Manager, Contiguration Solutions 


desktop with a single programming hardware solution. lig, In. 


michelle. badal@xilinx.com 


In today’s competitive landscape, engineers 
need to have flexibility — it’s the key to max- 
imizing your design efforts. The MultiPRO 
Desktop ‘Tool, our latest programming 
hardware, offers a number of features to 
provide you with the most flexible, low-risk, 
and cost-effective way to prototype all 
Xilinx devices. 


MultiPRO Desktop Tool Increases 

Flexibility and Reduces Cost 

AY CTI 5 0d CO as DY-r) \c co) ome Rete) Gam P10) (OB 
released in December, is a complete pro- 
gramming solution that enables you to 


realize the full potential of Xilinx pro- 





grammable logic devices. Designed specif- 
ically to interface with a PC via parallel 
port (IEEE 1284), the MultiPRO Desktop 
Tool provides: 






. -- e In-system programming and configu- 
: = ) - —— ration of all Xilinx devices 
we ] — 4 7 on j in : 1 am j 
i = — — ‘ 
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CoolRunner™-II CPLDs and 
XC18V00 ISP PROMs 


‘ 
¢ Comprehensive configuration 
mode support 
¢ A low-cost solution by integrating 
desktop programmer and download 
cable functionality. 
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In-System Programming 

With MultiPRO Desktop Tool, you can 

program and configure all Xilinx devices 

in-system. MultiPRO Desktop Tool’s flexi- 

bility enables you to program your devices 

right at your desk by following these steps: 
¢ Power MultiPRO with +5VDC, via 


an external AC power brick 


¢ Use ISE iMPACT v5.1i SP3 or 
higher on your PC 


¢ Interface the MultiPRO pod with 
the PC via a parallel port (IEEE 
1284) (cable included) 


¢ Connect the MultiPRO pod to 
the PCB with the ribbon cables 
(included) 


¢ Run software to download bit- 
stream and program/configure tar- 


get FPGA, CPLD, or PROM 
(Figure 1). 


Standalone Programming 

MultiPRO Desktop Tool reduces the 
risks of prototyping by enabling you 
to perform standalone programming 
on CoolRunner-II CPLDs and 
XC18V00 ISP. As with in-system pro- 
gramming, you can program your 
devices right at your desk by following 
these steps: 

¢ Power MultiPRO with +5VDC, 


via an external AC power brick 


° Use ISE iMPACT v5.1i SP3 or 
higher on your PC 


e Interface the MultiPRO pod with 
the PC via a parallel port (IEEE 
1284) (cable included) 


e Standalone programming of CoolRunner-Il 
(PLDs and XC18V00 ISP PROMs 


¢ Connect the MultiPRO pod to a 
CoolRunner-I] CPLD or XC18V00 
ISP PROM adapter (adapters available 
for all package types) 


¢ Run software to program target 
CoolRunner-II or XC18V00 ISP 
PROM (Figure 2). 





TREE 1284 


forllid Peri Cable | 





Figure 2 - MultiPRO with programming adapter 


e |SE iMPACT download software 
v5.1i SP3 or higher 


e In-system programming of all Xilinx ISP 
PROMs and CPLDs 


e In-system configuration of all Xilinx FPGAs 





Table 1 - MultiPRO Desktop Tool 
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e PC (Parallel Port - 
IEEE 1284) 


Comprehensive Configuration Mode Support 
The comprehensive configuration mode 
support gives you the flexibility to choose 
the most suitable mode for your design, 
while using only one programmer hardware 
solution. MultiPRO Desktop Tool is sup- 
ported by ISE iMPACT download software 
v5.li SP3 or higher, and supports JTAG 
(IEEE 1149.1), Xilinx slave serial, and 
SelectMAP modes (Table 1). 


MultiPRO Desktop Tool Features 
¢ Provides standardization 
with J Drive™ IEEE 1532 


programming engine 


¢ Provides prototype debug environ- 
ment with ChipScope™ ILA and 
ChipScope Pro compatible 


e Automatically adapts to target 
I/O voltage. 


Conclusion 
MultiPRO Desktop Tool reduces the 
risks and costs of prototyping in a num- 
ber of ways: 
¢ It provides a single solution that 
meets all your desktop program- 


ming and download cable needs. 


¢ It frees up more expensive resources 
designed for mass programming 


and configuring of devices. 


MultiPRO Desktop Tool provides the 
most comprehensive programming solu- 
tion to date from Xilinx. For more infor- 
mation on how to increase flexibility, 
reduce risk, and reduce system prototyp- 
ing cost, visit: www.xilinx.com/xlnx/xil_ 


prodcat_product.jsp?title=csd_cables/. ¥. 


© JTAG (IEEE 1149.1) mode 
e Slave Serial mode 
e SelectMAP mode 


© Desktop Programming mode 
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by Peter Seng 

Managing Director 

SENG digitale Systeme GmbH 

peter@seng.de 

Existing application-specific development 


and production solutions may be fine for 





high-volume markets, but for low- and 


@ mid-range product volumes, these solu- 

tions are most often either too expensive 

a or otherwise not suitable. As a result, 
these types of products are usually built 

with off-the-shelf PC- or microcontroller 


board-based components. Although stan- 
dard components are relatively low cost, 


they are rarely a perfect fit for your design 


@ 
irs new design ayAVATROOTANTVAVI specification and almost away require 
_ you to develop and add specialized hard- 
from Salle digitale Systeme ware. Programmable FPGA-based hard- 
ware offers an attractive alternative, but 
GmbH 7 based On nae and so far, entrance barriers have been high, 


and development time and costs difficult 


standard parts — makes it easier Vane 
10 estimate development costs. SENG digitale Systeme GmbH recently 


introduced a scalable development envi- 





ronment that makes it easy to prototype 

and produce programmable FPGA-based 

hardware using standard parts and at pre- 

dictable costs. The core of the SENG envi- 

ronment is the Digital Logic Kernel 

(DLK), which consists of an FPGA + CPU 

Ly. | + memory + PC interface. The DLK uses 
F standard Xilinx parts, including FPGA and 
CPLD source codes, software, and design 
" rules, thus eliminating the need to program 
as = equipment or preprogram parts before use. 
zt f a The DLK is, by default, a self-bootable 
7 : device. To exchange data or for administra- 
tion purposes, the DLK is accessed via an 
integrated PC parallel printer port inter- 

face. All you need to build, program, and 

service devices in the field is a PC with a 


parallel printer port (Figure 1). 


The DLK Concept 
The DLK is well suited for designing, 


building, and programming products 


containing any kind of hard or soft CPU 
with internal and external memory, and 
iy programmable logic. The system inter- 
ie koe connect concept makes the DLK appro- 
priate for several CPU, FPGA, and 

CPLD device families. 
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Figure I - DLK concept 


Components 

The DLK development kit consists of soft- 
ware and hardware source code, software, 
schematics, and a demonstration board. 
Components include: 


¢ FPGA internal 8-bit bidirectional 
interface to the PC with parallel I/O 


bus structure (Figure 2) 


e Driver and software for several operat- 
ing systems, including Win 9x- and 


WinNT 4.0-based OS 
¢ Flash CPLD-based state machines and 


memory 


e FPGA internal emulation of JTAG 


programming interface 


e Administration software running on 
PC for Windows™ OS, including 
dynamic link library, application pro- 


gram, and source code (Figure 3) 
¢ Demonstration board (Figure 4) 


e FPGA source codes (varies with grade 
of license) 


¢ 8032 C sample source code containing 
LCD, UART, interrupt, and I’C 


routines. 


DLK Design Flow 


1. Design an application-specific board 
according to the DLK principles, or use 
one of the available DLK boards. 


2. Rearrange FPGA, CPLD, and PC soft- 
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parport 
SPP+PS2+EPP 


£——| PP_D(7:0) 






DIN(7:0) 
PP nBUSY DOUT(7:0) 
PP_ERROR 

PP_SLCT 


PP_PE 


PP_nSLCTIN 
PP_nSTROBE 


PP_nAUTOFD 
PP_INIT 
PP_ACK 


CLK 


Figure 2 - PC parallel port interface, 
ISE schematic symbol 


ware according to your specific needs; 
compile. 


. Connect application-specific board to PC 


parallel port; configure it with JTAG 


emulation bitstream using DLK software. 


. Use iMPACT software to embed flash 


memory on board CPLD. 


. Develop application-specific logic and 


CPU programs; compile. 


. Download FPGA bitstream to the 


board and communicate with it using 


DLK routines. 


. Debug your application. 
. Store the final FPGA configuration bit- 


stream and CPU program in flash 





memory; disconnect from PC. The 


application is ready. 


9. For administration, upgrade, or com- 


munication purposes, just reconnect 


board to PC. 


Conclusion 
The DLK concept is an open system envi- 
ronment that uses standard tools and readily 
available elements, enabling development of 
digital products quickly and cost-effectively. 
This development system is especially suited 
for developing prototypes and for low- and 
mid-range product volumes. The system is 
compatible with Xilinx ISE design tools and 
Windows operating systems. The kit 
includes several real-world examples to min- 
imize training. All source code is available. 
DLK development kits, including demo 
boards, are available now, starting at $298. 
System and support is available from SENG 
digitale Systeme GmbH. Write to 
info@seng.de or visit www.seng.de for more 


information. & 
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Figure 4 - DLK51 board 
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Use SpyGlass. Predictive 


ective 
Atrenta Inc.’s SpyGlass software uses a 
“look-ahead” engine based on tast-synthesis 
() technology to help you identity potential problems — 


and fix them — early in the design process. 







by Bhanu Kapoor 
Technology Director 
Atrenta Inc. 
bkapoor@atrenta.com 


Decisions made early in the design process 
affect the entire chip design process. Thus, 
to manage designs effectively, the 
tools you use for RTL (register 
transfer level/language) design 
must enable a stepwise refine- 
ment of the code by predicting 
likely downstream problems. 
For example, to achieve high 
performance, the guidelines for 
synchronous design must be 
met, as they play an important role 

toward meeting timing objectives. 
Recognizing the importance of adher- 
ence to appropriate guidelines during the 
RTL coding process, Xilinx recommends a 
set of coding guidelines for designs target- 
ing the various Xilinx device families. Most 
designers currently follow a manual 
method of design reviews to check to see if 
coding guidelines have been met. This 
manual method is both prone to error and 


time consuming. Automating adherence to 
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RTL coding guidelines, therefore, would 
greatly increase the timely success of proj- 
ects targeting high-performance designs on 
Xilinx device families, such as the 
Virtex™™_II series of platform FPGAs. 
The SpyGlass™ Predictive Analyzer 
tool from Atrenta Inc. automates the 
process of meeting design guidelines 
through the use of an underlying predic- 
tive analysis technology. This approach 
performs detailed structural analysis on 
RTL aspects to check for 
coding styles, RT'L-handoff, 
design reuse, clock/reset 
requirements, verification, 
timing guidelines, and 
much more. The SpyGlass 
“look- 


ahead” engine based on 


tool employs a 
fast-synthesis technology 
and a fast, built-in, cycle- 
based simulator to carry out 
such analyses. Such a look- 
ahead methodology only 
uses RTL code as input to 
the SpyGlass tool and does 
not require any vectors or 
assertions. It is therefore 


very easy to set up and run. 


Policy-Based RTL Coding 

The SpyGlass™ Predictive 
Analyzer is a comprehensive, 
policy-based system that 
defines, in a succinct and 
organized way, design poli- 
cies that automatically point 
out time-consuming downstream issues. 
The SpyGlass tool then offers suggestions 
on ways to overcome these downstream 
issues during the RIL code development 
process. This helps you to meet your time- 


to-market target. 


Policy 

A policy is a collection of rules for a specif- 
ic purpose, such as rules associated with a 
standard, a silicon vendor, or a specific 
design tool. Policies enhance user extensi- 
bility, allowing you to develop and manage 
customized groupings of rules more easily. 
The design methodology includes a set of 


policies that you can select during the 
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process of RIL code development. 
Examples of such policies include lint, 


reuse, verification, timing, and testability. 


Rule Groups 

A collection of rules within a policy is 
termed a group. Typically, groups consist 
of rules addressing a particular area of 
interest in the RT'L code. Groups are hier- 


archical, meaning that a group can contain 


other lower-level rule groups as well as 











Figure I - Example of policy consisting of rule groups and rules 


individual rules. A group provides an addi- 
tional level of modularity in applying poli- 
cies to a given RTL design. 


Rules 

A rule is the most fundamental element in 
the policy-based management system. It 
describes a set of conditions that — when 
checked by the policy engine — result in an 
indication of a specific problem with the 
RTL code. Rules allow standard analysis 
of the RTL code. An example of a policy, 
rule groups, and rules in the SpyGlass sys- 
tem is shown in Figure 1. With this sys- 
tem, you can selectively turn on specific 


groups or specific rules within a group. 


Policy Engine 

The policy engine accelerates electronic 
product development by enabling develop- 
ment teams to capture, aggregate, distrib- 
ute, and apply constraints and requirements 
early in the development cycle. The fast 
synthesis engine internally creates a struc- 
tural view of the design and foresees down- 
stream issues early in the development 
cycle, thereby eliminating errors at the ear- 
liest possible stage. 

Although you can 
check many complex rules 
statically on the internal 
synthesized _ structural 
view, other rules require 
some understanding of the 
logic function of the 
design. This is particularly 
true for testability-related 
checks. In order to per- 
form a testability check, 
you must use an evaluator. 
The evaluator in the poli- 
cy engine is effectively a 
zero-delay, _ cycle-based 
simulator you can use to 
resolve functional design 
constraints, as well as carry 
out a simulation required 
to set up the design for 
testability analysis. 

Policy implementa- 
tion requires a traversal 
engine that works on the 
RTL netlist produced by 
fast synthesis. The 
SpyGlass engine generates basic primitives 
for rules that cover design traversal. The 
connectivity information, coupled with 
the traversal primitives, enable you to cre- 
ate rules that look for violations across the 
design hierarchy. 

By applying a policy over a given RTL 
design code, you can obtain a wealth of 
information about the design — as well as 
any of violations of the defined rules. The 
SpyGlass interface highlights rule viola- 
tions on the specific lines of RTL code. 
Not only does the SpyGlass predictive 
analysis tool offer an extensive help func- 
tion to assist you in understanding the 


nature of the violation, but also, where 
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Figure 2 - Example of rule violation and associated debug windows 


appropriate, it makes suggestions for 
resolving the problem. 

Moreover, each violation is highlighted 
on a structural view of the design, which 
can give you additional debugging infor- 
mation for complex problems. And, as 
shown in Figure 2, cross-probing between 
the code and structural views reveals an 
example of a clock domain synchronization 
problem. Additionally, a comprehensive set 
of reporting mechanisms and violation 
management utilities — such as waiver to 
allow a specific rule violation in a given 
point in the design — further facilitate the 


RTL coding and assessment process. 


Coding for Xilinx Devices 
Xilinx recommends a set of coding guide- 
lines for designs targeting the various 
Xilinx device families. These include gener- 
al coding guidelines, as well as synthesiz- 
able HDL guidelines. Furthermore, syn- 
chronous design guidelines help you meet 
timing objectives in high-performance 
designs such as those targeting the Virtex- 
II FPGAs, which can accommodate as 
many as 8 million gates and operate at fre- 
quencies as high as 400 MHz. 

Clock distribution also requires specific 


design code guidelines for successful imple- 
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mentation. For this reason, it is important 
to identify the clocks and resets in the 
design, as well as the clock-tree network, 
early in the code development process. 

Many designs, especially in the net- 
working domain, typically use multiple 
clock domains, a situation that presents 
challenging integration and verification 
issues. Signals that originate in one clock 
domain may be used in other clock 
domains. Correct chip operation can only 
be ensured if rules governing correct use of 
signals across asynchronous clock bound- 
aries are followed. 

The routability of design is another 
aspect that can benefit greatly by follow- 
ing appropriate coding guidelines. The 
number of I/Os, fanout of intermediate 
nodes, and widths of internal buses must 
meet specified limits for various parts of a 
given design. 

Using the SpyGlass predictive analysis 
system, you can check for all of the above 
violations, problems, and issues during 
the RTL coding process itself. This saves 
many design iterations that might other- 
wise be needed to fix errors later in the 
design cycle. 

Specifically, you can use the SpyGlass 
tool to check the following issues relevant 


to Xilinx FPGA designs: 


e Single clock edge is used in the design 
to clock the data. 


¢ Internally derived or generated clocks and 


set/resets are not used in the design. 


e Appropriate synchronizers are used as sig- 


nals cross clock-domain boundaries. 


¢ The number of levels of logic between 


registers is less than a specified value. 


¢ The fanout of any design node does not 


exceed a specified number. 


e Unintended latch inferences are avoided 


in the code. 


e If-then-else or case statements are not 


nested deeper than three levels. 


¢ Naming conventions are followed while 


coding pipeline logic. 


e Assess memory requirements and find out 
feasibility of conversion into regular flops, 


distributed memory, and block memory. 


¢ For state machines, separate the next-state 
decoding and output decoding into two 


discrete processes or always blocks. 
¢ Outputs from design units are registered. 


¢ Identify dead code and floating nodes 
in the design. 


Several of these checks are critical for 
meeting timing objectives in high perform- 


ance designs. 


Conclusion 
We have described the elements of policy- 


enabled 
through SpyGlass predictive analysis, for 


based design methodology, 
effective RTL coding leading to efficient 
implementation of designs on Xilinx-based 
platforms. This approach performs detailed 
structural analysis on RIL code to check 
for coding styles, design reuse, clock/reset 
requirements, verification, timing guide- 
lines, and more. SpyGlass software includes 
the most comprehensive coverage of 
Xilinx-recommended coding guidelines 
that are essential for reuse and efficient 
design implementation. 

To learn more about SpyGlass predic- 


tive analysis, go to www.atrenta.com. %: 


Spring 2003 


Altium. 


Making Electronics Design Easier™ 








Enter the next dimension 


As a professional engineer you know that electronics 
design involves many dimensions. Capturing design data, 
analyzing and verifying the circuit's performance, 
synthesizing VHDL for FPGA implementation - these are 
not fragmented processes, but multiple dimensions of a 
single design flow. That's why Altium has pioneered 
nVisage DXP - the first design capture system to go beyond 
current one-dimensional schematic capture systems. 


With nVisage DXP you can capture the multiple dimen- 
sions of your design within a single application. nVisage 
includes multiple design entry methods that support both 
schematic and VHDL based entry. nVisage also gives you 
multiple analysis and verification tools, such as integrated 
XSpice/SPICE 3f5 circuit simulation, pre- and post-layout 


signal integrity analysis, and VHDL simulation. For circuit 
implementation, nVisage's extensive output options 
for board layout are complemented by powerful RTL-level 
VHDL synthesis for multiple FPGA target architectures, 
allowing you to design for both programmable devices 
and PCBs within the one application. 


Why not envisage a new dimension 
in design and take a look at the 
nVisage multimedia demonstration 
at www.nVisage.com. You can also 
download a free 30-day trial version, 


from the website or order the trial § 
CD by calling: 800-470-2686. 


Multi-dimensional design capture for every engineer 
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Get RealFast RIOS 
with Xilinx FPGAS 


Realtime operating systems implemented in Xilinx FPGAS entiance nenonnances 
improve predictability, simplity design, and lower system costs: 
; - Le 7...” y AT Py ) 






by Tommy Klevin 
Product Manager 

RealFast 
tommy.klevin@realfast.se 


Designers once used field programmable 
gate arrays (FPGAs) as the glue logic to inter- 
connect discrete components on a printed 
circuit board (PCB). As we start a new era, 
though, we can now build complete systems 
in one FPGA chip. 

To take advantage of the possibilities 
offered by Xilinx Platform FPGAs, we 


must consider the kind of operating system 








best suited for these one-chip systems, as 
well as other systems that contain more 
than one central processing unit (CPU) or 
one digital signal processor (DSP). 

The RealFast company in Sweden has 
many years of experience in FPGA design, 
in operating systems, and in real-time sys- 
tems development. We recently developed 
the Sierra 16 real-time operating system 
CRN@S)=implemented in an) ERG awit 
enhance performance and _ predictability 
and to reduce system complexity. This 
article describes the Sierra 16 RTOS, 


explains what it can do, and explores the 


possibilities of putting operating system 


functions into hardware. 
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Operating System Accelerator 





Irq = Interrupt Handler 

Tmg = Time Management Handler 

Sem = Semaphore Handler 

GBI = Generic Bus Interface 

TDBI = Technology Dependent Bus Interface 


Figure I - Block schematics of the Sierra 16 HW-RTOS 


Sierra 16 RTOS 


Why implement an operating system in 
hardware? Is it not better to have the oper- 
ating system in software as we are used to? 
Well, who would have imagined in the 
mid-1980s that mathematical operations 
would be performed by hardware in the 
CPU instead of software? A closer look at 
operating systems shows that many of the 
low-level primitives are similarly independ- 
ent of the operating system. These primi- 
tives can be performed by either hardware 
or software. 


Some example low-level primitives are: 


¢ Task handling — create and schedule 
tasks 


¢ Synchronization — semaphores, 


message passing 


¢ Time handling — delay, timeout 
handling 


° Interrupts. 


What is the point of doing these opera- 
tions in a hardware FPGA system instead 
of in software? Even in the new platform 
FPGAs, memory is limited if you do not 
add external memory. In small, cheap 
products, you probably want to squeeze 
your application into the available memory 
in the FPGA. If you use a traditional oper- 
ating system, the operating system eats sub- 
stantial memory, leaving less space for the 
application. 

On the other hand, platform FPGAs 
contain logic that is just waiting to be used. 


By moving the operating system to logic, 
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you save a lot of space in memory, and at the 
same time, you increase performance and 
achieve a fully predictable system. 

The Sierra 16 RTOS has the same func- 
tionalities found in other traditional soft- 
ware real-time operating systems. As shown 
in Figure 1, the Sierra 16 RTOS has the 


following configuration: 


e Handles as many as 16 tasks at eight 


priority levels 
¢ Supports 16 semaphores 
¢ Handles timing — delay, timers 


¢ Supports eight external interrupts. 





Figure 2 - ISR is transferred 
to the ready state when an 
external interrupt occurs. 


In the coming months, RealFast will com- 
plement the Sierra series with other configu- 


rations to suit both small and large systems. 


Interrupt Handling 
Interrupt handling is critical in most 
systems and is particularly critical in real- 
time systems. Interrupts introduce unpre- 
dictable behavior because of the difficulty 
in predicting when they will arrive. The 
Sierra 16 RTOS has an intelligent inter- 
rupt handler that makes it possible to 
achieve fully predictable system behavior, 
even though it is not possible to know 
exactly when the interrupts will arrive. 
We treat the interrupt service routines 
(ISR) like any ordinary task with a certain 
priority in the system. As shown in Figure 2, 
when not taking care of any service, ISRs are 
in a wait state. When the interrupt arrives, 
the ISR is transferred to the ready state in the 
Sierra 16 RTOS scheduler. ISR tasks execute 
by priority. To switch between different tasks, 
the Sierra 16 RTOS interrupts the CPU with 
a task-switch interrupt. This is the only inter- 
rupt that interrupts the CPU. All mecha- 
nisms are handled by the Sierra 16 RTOS — 
from the physical pin, to scheduling, to inter- 
rupting the CPU so it can switch from pres- 
ent task code to the ISR code and start 


executing It. 


Interrupt occurs 





ISR=task is moved from 
"wait for IRQ-state to 
ready-state 
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FPGA System-on-Chip 






ace) al 


Figure 3 - On-chip 


monitoring system 


Nonintrusive Monitoring and Debugging 
Run-time observability in embedded sys- 
tem architectures is a requirement for test- 
ing, debugging, and validating design 
assumptions made about the behavior of 
the system and its environment. The clas- 
sic approach to run-time observability is 
to apply monitoring — the process of 
detecting, collecting, and interpreting 
run-time information regarding the sys- 
tem’s execution behavior. 

When monitoring real-time systems, 
an important aspect is to minimize — or 
better yet, completely avoid — the intru- 
siveness of the monitor on the system’s 
timing and execution properties. Failure 
to handle monitor intrusiveness may lead 
to probe effects that cause nondeterminis- 
tic behavior in programs with race condi- 
tions and poor synchronization. 

When we use a hardware RTOS (HW- 
RTOS) like the Sierra 16 RTOS, we avoid 
the intrusiveness problem. Because the 
Sierra 16 RTOS comprises a number of 
hardware components that know at each 
moment exactly what is going on in the 
system, this information can be extracted 
without any software probes normally 


used for extracting information. One 
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Tool Environment 





intellectual property (IP) component 
available for the Sierra 16 RTOS is the 
Multiprocess Application Monitor 
(MAMon). As shown in Figure 3, this 
monitor is implemented in hardware and 
listens nonintrusively to the different 


parts inside the Sierra 16 RTOS. 


Multiprocessor Solutions Made Easy 

Loosely coupled systems and tightly cou- 
pled systems contain more than one 
CPU. In a loosely coupled system, all 
CPUs have their own memory for such 
things as program codes and data, and the 
CPUs communicate through a shared 
memory. In a tightly coupled system, 
often called Symmetrical Multiprocessor 
System (SMP), all CPUs share the same 
memory and execute code and data from 
this memory. 

The problem with these kinds of sys- 
tems, in particular SMP systems, is the 
difficulty in making efficient operating 
systems that use the efficiency of multiple 
CPUs. It takes complex algorithms to 
handle and synchronize multiple CPUs. 
Pure software solutions for operating sys- 
tems with multiprocessor support 
become inefficient and do not give the 


desired performance boost compared to 
single CPU systems. 

An RTOS implemented in hardware 
can, on the other hand, perform many par- 
allel operations. This capability provides a 
completely new approach to solving the 
problem with multiprocessor systems. All 
algorithms that are complex and time- 
consuming in software are moved to hard- 
ware. All synchronization of tasks, such as 
semaphores, is handled by the HW-RTOS. 

The Sierra 16 RTOS supports single 
CPU systems, but RealFast will soon 
release another HW-RTOS series that pro- 
vides support for systems with two or more 


CPUs or DSPs. 


MicroBlaze CPU + Sierra 16 HW-RTOS 

A powerful but very inexpensive RTOS 
solution is to combine a _ Xilinx 
MicroBlaze™ software CPU with a 
RealFast Sierra 16 HW-RTOS. With a 
small 150K-gate Spartan™-II FPGA and 
(maybe) some external memory, you can 
create a complete and advanced real-time 
system at a very low cost. 

In bigger Spartan-II and Virtex!-II 
FPGAs, it is even possible to exclude exter- 
nal memory, as the internal block RAMs 
will be big enough for both driver and 
application in many cases. 

The software driver for the Sierra 16 has 
a minimal footprint, only a couple of kilo- 
bytes. This driver, together with the HW 
kernel, makes a very fast and predictable 
RTOS kernel. With a system running at 50 
MHz, most of the system-calls are finished 


within 2 microseconds. 


Conclusion 

The new Xilinx Platform FPGAs offer new 
design approaches and new possibilities. 
Because logic has grown, we need to start 
solving problems in a parallel manner, 
rather than in the sequential manner that 
we are used to. 

An operating system implemented in 
hardware is a step towards parallel process- 
ing, and we believe that these kinds of cost- 
effective solutions will meet the increasing 
demands of applications in the future. To 
learn more about the RealFast Sierra 16 


HW-RTOS, visit www.realfast.se/, & 


Spring 2003 


Virtex-Il Pro Plattorm FPGAs 
Deliver Proven Interoperability 


With verified interoperability between specialty ASSPs and Xilinx Virtex-II series FPGAs, you can 
focus on designing and debugging your system rather than how to make all the parts work together. 
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by Anil Telikepalli 

Marketing Manager, Virtex Solutions 
Xilinx, Inc. 

anil. felikepalli@xilinx.com 


Virtex™-II series Platform FPGAs 
include a rich set of system features such 
as embedded memories, DSP, and I/O 
connectivity with which you will design 
most blocks of your system. But your 
design might need to incorporate special- 
ty ASSPs. These could be large memories, 
optical transceivers, analog filters, A/D 
and D/A converters, specialized DSP/net- 
work processors, connectors, and many 
others. Interoperability of all these devices 
on the printed circuit board (PCB) is a 
tough challenge. 

Xilinx Virtex-I] Pro™ and Virtex-II 
FPGAs solve these challenges by enabling 
you to connect to almost any ASSP with- 


out a hitch. This article will show you how. 


Interoperability Can Make 

or Break Your Design 

Making multiple disparate devices compat- 
ible — often from different vendors — can be 
an arduous task involving numerous design 
and debug iterations. Even if you do suc- 
ceed in making all the devices work togeth- 
er, the inefficiency of such a design is likely 
to impose a performance penalty that pre- 


vents the overall design from achieving its 
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maximum performance objectives. 
Unknown interoperability of devices 
increases the risk of cost overruns and 
slipped schedules, and can even jeopardize 
design completion. Starting with a 
Platform FPGA that delivers interoperabil- 
ity of the devices you will use in your 
design is critical to predicting project time, 


cost, and feasibility. 


Xilinx Leads in Interoperability 
Xilinx offers complete interoperability 
solutions with leading ASSPs across a 


broad range of interface standards. This 


Pro Platform FPGAs and IP cores for pro- 
tocols, the Xilinx SystemIO solution pro- 
vides ultimate connectivity with ASSPs. 
The Virtex-II Pro FPGA family was built 
on the Virtex-II family in a completely 
scalable framework — specifically to address 
interoperability and prevent problems 
common with ground-up competing archi- 
tectures. Virtex-II Pro FPGAs inherit the 
entire Virtex-II Select1O-Ultra technology, 
which directly addresses connectivity and 
interoperability with ASSPs using estab- 
lished parallel system interfaces. Virtex-II] 
FPGAs have been widely used as an inter- 


Given that ASSP vendors have been working with 


NelectI0-Ultra technology for more than two years, 


Virtex-II Pro FPGAs give you unparalleled edge in 


interoperability over any competing FPGA in the world. 


ensures that the integration of third-party 
ASSPs will go smoothly, that the cost of 
design implementation will be predictable, 
and that your projects will succeed the first 
time through. 

By choosing Virtex-II Platform FPGAs 
over alternatives that lack powerful capabil- 
ities for interoperability, you will lower the 
risk of slipped deadlines, significantly lower 
the cost of the project, accelerate design 
and development cycles, and simplify the 
expensive system debug process. 

With its extensive network of partner- 
ships with ASSP vendors through the 
Reference Design Alliance Program — the 
only such program in the programmable 
logic industry — Xilinx gives you the edge. 
Xilinx and ASSP vendors work closely to 
deliver fully interoperable solutions. This 
allows you to simplify system development 
and eliminates the tough challenges of mak- 
ing all the different devices on the board 
work together seamlessly. You can focus 
your creative energy on product differentia- 


tion and maximizing system performance. 


Virtex-Il Pro Delivers Proven Interoperability 
With proven RocketlO™ serial and 
SelectIO™-Ultra parallel I/Os in Virtex-I] 
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face to bridge disparate ASSPs. Given that 
ASSP vendors have been working with 
SelectlO-Ultra technology for more than 
two years, Virtex-IIT Pro FPGAs give you 
unparalleled edge in interoperability over 
any competing FPGA in the world. 

Virtex-II Pro devices support dozens of 
emerging, established, and even proprietary 
connectivity standards. The embedded 
RocketIO. serial transceivers enable the 
widest range of programmable serial band- 
width: 622 Mbps to 75 Gbps covering 
emerging serial standards such as Gigabit 
Ethernet, 10 Gigabit Ethernet XAUI, PCI 
Express™, SxI-5, TFI-5, Serial RapidlIO™, 
InfiniBand™, and Fibre Channel. 

SelectIO-Ultra parallel I/O system 
interfaces support such standards as SPI3 
(POS PHY 3), SPI-4.1 (Flexbus 4), 
SP14.2 (POS PHY 4), XGMIU, RapidIO, 
PCI, PCI-X, CSIX, HyperTransport™, 
XSBI, and SFI-4 using 22 single-ended 
and six differential electric standards 
including LVITTL, LVCMOS, PCI, 
PCI-X, HSIL,. SS Fly. LVDS, LVPECL, 
and HyperTransport. 

Checking silicon features and electri- 
cal I/O standard support is only the first 
step towards interoperability with ASSPs. 
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Figure 1 - Four RocketIO channels at 2.488 Gbps 
with Ignis Optics four IGP-2000 OC-48 SFP 


optical transceivers 





Figure 2 - SPI4.2 static and dynamic 
alignment mode HW verification completed 
with PMC-Sierra S/UNI-9953 
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Figure 3 — SPI4.1 HW interoperability with 
AMCC Ganges-II device 


Xilinx goes far beyond this by providing 
pre-verified interface and controller IP 
cores — jointly verified in hardware with 
ASSP vendors during several months of 
engineering testing to guarantee full 
interoperability. Reference designs and 
boards are also available to demonstrate 
true hardware interoperability, as shown 
in Figures 1, 2, and 3. 

The list of ASSP devices interoperable 
with Virtex-II series FPGAs is shown in 
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Table 1. The latest list of interoperable 
devices, along with reference designs, is 
always available at www.xilinx.com/company/ 
reference_design/interop_solutions.htm. ‘These 
ASSPs include devices from Intel, AMCC, 
PMC-Sierra, Mindspeed ‘Technologies, 
IDT, and other vendors. 

In addition, Xilinx actively participates in 
interoperability events and testing activities. 
Recently, Xilinx submitted Virtex-II Pro 
FPGAs for interoperability testing to the 10 
Gigabit Ethernet Consortium at the 
New — Hampshire's 
InterOperability Lab, which was attended by 
13 participating vendors. (ftp-//public.iol. 
unh.edu/pub/10gec/Oct02_GTP_release.pd}). 


University — of 


Xilinx Lets You Choose the Best ASSPs 

The Virtex-II Pro solution permits you to 
design-in the best ASSPs that fit your 
needs. ASSPs come with varying and often 
incompatible interface standards. Lack of 
extensive interoperability capabilities can 
limit design alternatives. For example, 
choosing a network processor ASSP sup- 
porting the HyperTransport standard 
could prevent you from choosing a 
security co-processor ASSP supporting 
only the RapidIO standard. Even if 
these two ASSPs matched your require- 
ments perfectly, you may still be forced to 
choose less suitable ASSPs just because 
they work together. 

With the Virtex-II Pro FPGAs’ support 
for multiple system connectivity standards 
and proven ASSP interoperability, you can 
effortlessly bridge standards and ASSPs. 
Select the right ASSPs that best fit your 
product needs based on capability and 
price, not on how well they work together. 
The Xilinx leadership in interoperability 
gives you the clear advantage in differentiat- 


ing your products from the competition. 


Conclusion 

Interoperability of disparate devices on the 
board is a critical parameter for design suc- 
cess. Your design productivity, performance, 
and even the probability of completion can 
be significantly improved by thoroughly ver- 
ified interoperability among semiconductor 
devices and vendors. Xilinx and the leading 
ASSP vendors want you to be confident that 
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you are in safe hands when working with 
their devices. 

Start your next design on a robust 
platform that has a proven record of sev- 
eral years of diverse interoperability to its 


ASSP Vendor ASSP Device 


ASSP Device Type 


credit. Focus your creative energy on 
product function, performance, and dif- 
ferentiation, rather than worrying about 
whether the components will work 


together. && 


Interface Standard Reference Design 


AMCC Ganges-II 
ATM, SONET/SDH 


Framer/Mapper 


Bay Microsystems | Mango 


Broadcom 
Octal SerDes 


Ignis Optics 


0C-192, 4x0C-48 POS, 


: 


Network Processor Host/Accountant 


a Bia > . 
2.5 Gbps SFP Optical transceiver 2.488 Gbps Yes 
per Channel 


IXF1810x Family 
10Gb Ethernet 


0(-192c POS/GFP & 


Coming 


MAC/Framer 


Mindspeed 
Technologies 


OptiPHY M29730 


NSE4256, 
IXP1200, SA110 


Netlogic 
Microsystem 


PMC-Sierra 
(PM3386) 


§/UNI-2xGE 
Ethernet Controller 


PMC-Sierra 


PMC-Sierra S/UNI 9953 


POS, ATM, Ethernet 


TeraCross TXQ1450, TX$1400 


0C-192/STM-64, 
Quad 0C-48/STM-16 
POS Framer 


Scheduler, traffic 
generation & 


7 


NSE bus/nPu bus 


Dual Gigabit SPI3 


10 Gb PHY for 
alignment 


$P14.2 dynamic 


Line Card Interfcace 
Control Link 


monitoring 


Velio vc1003 


SONET backplane, 
gbE SerDes 


. 


Velio vou, 021 vou, 1022 0C-48, 0C-192 SerDes LVDS pe 
Velio VC1061/1062 Storage Quad SerDes 10 GigE MAC pte 


ZettaCom ZTM202 





Table 1 - Virtex-II series interoperable solutions 


Traffic Management CAM, SRAM, CSIX pe 
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V | A Available now, the THE POWER OF XTREME PROCESSING 
’ Each IBM PowerPC runs at 300+ MHz and 420 Dhrystone MIPS, and is 
. ® 
n irtex -II Pr 7 . 
ew Virte : supported by IBM CoreConnect™ bus technology. In addition, the FPGA fabric 


Platform FPGAs herald an enables ultra-fast hardware processing, such as technology 


3, ie . a. : PowerPC 
TeraMACs/s DSP applications. With Xilinx’s unique ~ Processor 


astonishing breakthrough in system-level solutions. . 
IP-Immersion” architecture, system architects can now harness the power of 


With up to four IBM PowerPC” 405 processors high-performance processors, along with easy integration of soft IP into the 
industry’s highest performance programmable logic. Designers have absolute 


immersed into the industry's leading FPGA fabric, 


freedom to implement any design they can imagine. Never before has such 
ae ; : : 
Xilinx/Conexant's flawless high-speed serial I/O performance been achieved enabling hardware accelerated processing and 


multiple processing in an off-the-shelf device. 


technology, and Wind River System's cutting-edge 


embedded design tools, Xilinx delivers a complete ENABLING A NEW DEVELOPMENT PARADIGM 
For the first time ever, system designers can partition and re-partition their 


development platform of infinite possibilities. 
system between hardware and software at any time during the development 


The era of the programmable system 1s here. cycle—even after the product has shipped. That means you can optimize the 
overall system, guaranteeing your performance target in the most cost-efficient 


manner. You can also debug hardware and software simultaneously at speed. 


NAN 


ck Managers 


ination with XCITE Technology 


Up to 556 Multipliers 


THE ULTIMATE CONNECTIVITY PLATFORM 

The first programmable device to combine embedded processors along with 
3.125 Gbps transceivers, the Virtex-II Pro series addresses all existing connec- 
tivity requirements as well as emerging high-speed interface standards. Xilinx 
Rocket I/O” transceivers offer a complete serial interface solution, supporting 
10 Gigabit Ethernet with XAUI, 3GIO, SerialATA, InfiniBand, you name it. 
And our SelectI/O"-Ultra supports 840 Mbps LVDS and other parallel method- 
ologies. Think of it: up to 16 Rocket I/O 3.125 Gbps transceivers at your disposal, 
delivering the ultra-high bandwidth you need (40+ Gigabit per second total 
serial bandwidth) for real market challenges like optical networking, high-end 


broadcast, storage and DSP systems and so much more. 


THE POWER OF INTEGRATION 

In a single off-the-shelf programmable device, system architects can take 
advantage of microprocessors, multi-gigabit transceivers, digital clock managers, 
highest density on-chip memory, on-chip termination and more. The result is 
a dramatic simplification of board layout, a reduced bill of materials, and 


unbeatable time to market. 





INDUSTRY-LEADING TOOLS FROM WIND RIVER AND XILINX 
Optimized for the PowerPC, Wind River’s industry-proven embedded tools are 
the premier support for real-time microprocessor and logic . 

designs. And driving the Virtex-II Pro FPGA is Xilinx’s J 


lightning-fast ISE 4.21 software, the most comprehensive, 


easy-to-use development system available. 


See the new Virtex-II Pro Platform FPGA in detail. ~ 
Visit www.xilinx.com/virtex2pro today and step into the era of the 


programmable system. 


>= XILINX’ 


The Programmable Logic Company™ 


www. xilinx.com/virtex2pro 
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sSpartan-llE Family Grows 


Building on a tradition of 
cost-ettective speed and 

reliability, two new Xilinx 
Spartan-llE devices offer 

enhanced flexibility and 

higher densities — at the 
lowest cost per 1/0. 





_~ by Rutino Olay 
ad a Solutions Marketing Manager 


oe —§~— Xilinx, Inc. 
rutino.olay@xilinx.com 


With today’s challenging economic times, 
making cost-sensitive products such as 
plasma displays, set-top boxes, and broad- 
cast video equipment requires a low-cost 
solution. Additionally, the integration of 
more features in digital consumer products 
often demands more pins than previously 
available. Thus, we were challenged to 
create a low-cost solution that addressed 
the need for a high pin count device. 

Two new additions to the successful 
Spartan™-IIE family of devices meet 
these exacting criteria. The two devices, 
D( OPAST11 0) car: tile WD. ( OPAC LILI) OME Noi Cony orere yoy 
high-density, high-I/O devices that will 
allow you to target a wider spectrum of 
designs than you previously could with 


programmable logic. 
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Fourth Generation Spartan FPGAs 

Since introducing the Spartan series more 
than four years ago, Xilinx has delivered 
four generations and shipped more than 40 
million Spartan series FPGAs, with prices 
starting as low as $2.55 per device. With the 
Spartan series, you get the advantages of 
high I/O-count ASICs and gate arrays — 
and you get the added flexibility of a gener- 
al purpose, programmable architecture. You 
also get the benefit of a proven architecture 
with the industry's fastest and most produc- 
tive software tools, plus the most compre- 
hensive offering of IP cores from Xilinx and 
third-party AllianceCORE™ vendors. 

On November 18, 2002, Xilinx 
announced an extension to the Spartan 
product line to address customer demand for 
even higher density and higher I/O-count 
devices in the price ranges required for con- 
sumer applications. As shown in Table 1, the 
XC2S400E device (400,000 system gates 
and up to 410 I/Os) and the XC2S600E 
device (600,000 system gates and up to 514 
I/Os) give you the ability to integrate more 
functionality into a smaller form factor and 
still meet your stringent budget require- 
ments. With the Spartan-HE family, you get: 


¢ The lowest cost per I/O — The new 
Spartan-IIE devices give you more 
I/Os at much lower prices than any 


competing FPGA. 


Up to 514 I/Os — With the highest 
number of I/Os available in the 
low-cost segment of the FPGA 
industry, Spartan-IIE devices allow 
you to put higher density ASIC 
designs into FPGAs and still keep 
the benefits of reprogrammability. 


Four DLLs — The DLLs allow easy 
clock duplication, quick frequency 
adjustment, faster state machines 
using different clock phases, 
de-skewing of the incoming clock, 
and generation of fast setup and 


hold times or fast clock to outs. 


More than one billion MACs/sec 
per dollar — You can implement high 
performance DSP functionality at the 


lowest cost possible. 
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System Gates 50K 100K 
Logic Cells 1,728 2,700 
Block RAM Bits 32K 40K 
Distributed RAM Bits 24K 37K 
DLLs 4 4 
I/0 Standards 19 19 
Max Differential 1/0 Pairs 83 86 
Max Single Ended 1/0 182 202 
Packages TQ144 1Q144 
PQ208 PQ208 
FT256 FT256 
FG456 


XC2S600E 
150K 200K 300K 400K 600K 
3,888 5,292 6,912 10,800 15,552 
48K 56K 64K 160K 288K 
54K 73K 96K 150K 216K 

d | d ‘ ‘ 
19 19 19 19 19 
114 120 120 172 205 
265 289 329 410 514 

PQ208 PQ208 
FT256 FT256 FT256 
FG456 FG456 F6456 FG456 F6456 


FG676 FG676 


Table 1 - Spartan-IIE product matrix 


100% 


ASIC Shipments by 1/0 Count 


907% 
80% 
10% 
607% 


50% Spartan-IIE Family Extension 


Prior Spartan-IIE Family 
ae 
ot _ 


‘= 
0% 
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| 
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Source: Xilinx 





m >1,000 

i 597 - 1000 
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<59/ <=1,000 >1,000 


Figure I - Spartan-IIE extensions meet customer demand for more I/O. 


More |/Os for Digital Consumer Applications 
Moving to advanced technologies has 
always enabled Xilinx to dramatically 
reduce costs and simultaneously bring 
larger density devices within the reach of 
many more cost-conscious customers. 
Now, with the two new Spartan-IIE 
FPGAs, this same advantage is being 
brought into the I/O arena. 

Traditionally, consumer applications 
with more than 305 I/Os have required 
ASICs, as shown in Figure 1. But with the 
introduction of the XC2S400E and 
XC2S600E devices, you now have up to 
410 and 514 I/Os, respectively, and with 
the added advantage of reprogrammability. 


These two new FPGAs deliver a greater 
than 67% increase in I/O capacity over 
previous Spartan-IIE offerings, and up to 
100% more I/Os than competing FPGAs 
in the same density ranges. Additionally, 
the I/Os can be configured as differential 
I/O pairs (up to 205), giving you LVDS 
performance up to 400 Mbps. 


Supporting More I/O Standards 

Today's designs are more complicated than 
ever, with the majority typically containing 
numerous I/O standards on a single PCB. 
With the Spartan-IIE FPGAs you can con- 
nect as many as 19 different I/O standards 
on a single chip. This flexibility gives you 
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Table 2 - Density migration possibilities 


the ability to bridge different I/O standards 
and protocols — and completely eliminates 


the need for costly bus transceivers. 


Scaleable Footprints 

Following the tradition of Xilinx FPGA 
families, the Spartan-IIE family continues to 
support density migration across common 
packages without changing the PC board 
footprint. The relative positions of VCC and 
GND remain constant across like packages, 
unlike competing low-cost FPGAs. 

For example, when using a FG456 
package, six density members of the 
Spartan-IIE family can be interchanged, 
providing outstanding flexibility for design 
revision, upgrade, or cost optimization, as 
shown in Table 2. 


More RAM 

The new Spartan-IIE devices differentiate 
themselves in the amount of available 
memory both in block RAM and distrib- 
uted RAM. 

The XC2S400 has four columns and 
the XC2S600E has six columns of block 
RAM, equating to 160K and 288K of 
block RAM, respectively. This is a greater 
than 4X increase in capacity over the previ- 
ous largest density Spartan-IIE device. 
With this increase comes the possibility of 
storing more data, coefficients, FIFO func- 
tions, and larger general memory functions 
in your memory-hungry applications. 

Xilinx continues to be the only FPGA 
supplier to offer distributed RAM, which is 


an ideal solution for designs that require 
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| T0144 —-2-devices for migration 
PQ208 5 devices for migration 





FT256 6 devices for migration 





FG456 6 devices for migration 


FG676 2 devices for migration 








multiple small, fast, and flexible memories 
situated close to the logic. As with all other 
Xilinx FPGA families, the 4-input LUT in 
Spartan-IIE devices can also be used as 
memory, where it can be configured as 
ROM, or single-port or dual-port synchro- 
nous RAM. 

We've doubled the amount of distrib- 
uted RAM in these new devices to 150K 
for the XC2S400E, and 216K for the 
XC2S600E. 


Software and IP 

The entire Spartan-IIE family is supported 
by the Xilinx ISE (Integrated Software 
Environment) tool set, which includes the 
industrys most advanced timing-driven 
implementation tools available for pro- 
grammable logic design, along with design 
entry, synthesis, and verification capabili- 
ties (www.xilinx.com/ise5/). 

There are also more than 200 IP cores, 
including PCI, DSP, and other pre- 
designed and tested solutions (www.xil- 
inx.coml/ipcenter/) to get your designs up 


and running fast. 


Processing Solutions on a Budget 

By utilizing the Xilinx MicroBlaze™ 32- 
bit field programmable controller option 
with the Spartan series devices, you can cre- 
ate an easy-to-use, low-cost, customized 
processing solution. The MicroBlaze 


processor is the fastest, most powerful soft 


processor and peripheral solution on the 
market today for traditional 16-bit and 32- 
bit microprocessor and microcontroller 
applications. 

Coupling the ISE and MicroBlaze solu- 
tions gives you a winning combination 
with the benefits of: 

¢ Flexibility — Easily create a customized 
processor design that can be modified 


at any time during the design cycle. 


¢ Guaranteed product availability — You 
can purchase the MicroBlaze source 
code and never have to worry about 


processor obsolescence. 


¢ Reduce system cost — By integrating 
your entire processing solution within 
one device you not only save time and 
effort but you also reduce your bill of 
materials, inventory, and debug time. 


Conclusion 
To be successful in this tough, competi- 
tive marketplace, you need an inexpensive 
and flexible design solution. You also 
need fast, reliable performance and the 
lowest cost per I/O. With the expanded 
Spartan-IIE family there is no faster, safer, 
or lower cost way to develop next-genera- 
tion consumer products. 

To obtain a free Spartan-IIE Resource 
CD containing a wealth of information on 
the XC2S400E and XC2S600E Spartan-IIE 


devices, visit www.xilinx. com/spartan2e. %: 
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CoolRunner-|| 
Solutions 


Save Money 


With CoolRunnerlI CPLD devices, 
great things come in small packages. 








by Steve Prokosch 
CPLD Product Marketing 
Xilinx, Inc. 
steve.prokosch@xilinx.com 


When youre looking for a simple, low cost, 
and easy-to-use programmable logic device 
that incorporates multiple functions, think 
Xilinx CoolRunner!™-II CPLDs. These ver- 
satile, nonvolatile devices can save you time 
and money on your next design by reducing 
board costs and redesigns. 

Cost can be thought of in several different 
ways, depending on your point of view. For 
a buyer, it’s the bottom line of a bill of mate- 
rials. For a design engineer, it’s time invested 
and looming deadlines. 

Engineers also face tradeofts, such as how 
fast a product can be designed with a mini- 
mal number of board layouts. Your success 
may rest in the decisions you make while try- 
ing to accomplish this goal. When making 
component choices, it pays to have built-in 
flexibility; with reprogrammable logic, you 
get the best dollar value as well as the ability 
to deliver products ahead of schedule. 

Additionally, engineers must consider 
such factors as single-chip solutions, pack- 
age size, density, versatility, flexible I/O 
structures, and the ability to modify pin 
functionality after placement on the 
board. By considering these items before 
parts selection, you can save costs and still 


maintain flexibility. 


Single Chip Integration 

If you have unlimited board space, a large 
stocking warehouse, and inexpensive test 
and assembly costs, some of these cost fac- 
tors may not enter into the price equation. 
But if youre in a competitive marketplace, 
usually one or more of these items will be 
scrutinized: 


e Power-efficient board size 


¢ Minimum number of parts 


and suppliers 
¢ Low assembly costs. 
Board Size 
Typically, the packaging of your product is 
defined by board size, which is driven by 


the number of components you need to 
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get the job done. If you can squeeze out 
the required functionality and still stay 
within the power budget, you have met 
your goal. 

Dont forget that cost can mean board 
space to some engineers and inexpensive 
parts to others. As shown in Figure 1, with 
CoolRunner-II you can select small BGA 
packages such as 56- or 132-ball chip scale 
packages (CSP) for high integration or flat 
pack (FP) packages for low-cost solutions. 
If you are concerned with board size, you 
may need unique CSP options. Xilinx also 
offers 0.5 mm to 0.8 mm ball spacing 
packages that can save you more than 50% 
when compared to similar I/O count FP 
package options. 

Although the space savings from a 
14 mm-by-14 mm, 100-pin FP package 
to an 8mm-by-8mm, 132-ball CSP 
package may seem trivial, consider the 
routing involved. With flat packs, all pins 
typically route outward from the package. 
With BGA packages, routing can be 
achieved by running traces between the 
adjacent solder balls. These packages also 
offer more options when using denser, 
multilayer PCBs. This may yield twice 
the routing efficiency of a comparable FP 
package, further reducing board space. 
Thus, the capability of these small pack- 
ages goes well beyond the “wow factor” of 


their physical size. 


Parts and Suppliers 

Lower power consumption can also be 
achieved through reduced component 
counts. A single low-power CPLD device 
improves reliability by reducing the total 
number chance of cold or weak solder 
joints that may cause intermittent fail- 
ures. Heat dissipation may be reduced. 
And more solder joints also increase the 
chance of manufacturing problems. The 
more solder joints, the higher the chances 
of developing manufacturing problems. 
Heat dissipation may be reduced through 
fully utilizing a single part instead of 
powering multiple parts that may not be 
fully utilized. These two factors can have 
a direct impact on customer service and 
customer reliability ratings 


Figure 2 illustrates the many functions 
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you can squeeze into a single CPLD and 
still get the low power operation you desire. 

Maintaining multiple components for 
specific functions can lead to a nightmare 
for procurement. By expanding your sup- 
plier base, you increase demands on many 
different departments within your compa- 
ny and thus lengthen your time-to-market. 
These areas may include accounting, 
shipping and receiving, or component 
engineering. If you have a quality depart- 
ment, they may want reports on each indi- 
vidual device. 

Furthermore, the more devices you 


specify, the higher the chance of encoun- 
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reflects a direct assembly charge. If your 
contract manufacturer charges you to 
stock devices, this will also add cost to 
your end product. 

By keeping the component count 
down, you can dramatically reduce both 
direct expenses and the indirect cost of 


doing business. 


Integration and Flexibility 

If you need multiple I/O standards for 
unique memory devices or CMOS level 
translation, conversion devices may be nec- 
essary. Depending on your application, spe- 


cialty memory devices may also be required. 


$° XMLINK 
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Figure 1 - CoolRunner-II CPLD package offerings 


tering a production problem. It would be 
devastating to not be able to ship a multi- 
million-dollar product due to a $2 part on 
back order. By using more parts, you also 
run a higher risk of device obsolescence. 
This may not cause a delay in shipment, 


but it typically costs a board re-layout. 


Assembly Costs 

The more components shipped to your 
contract manufacturer, the more money 
you spend in shipping costs. Each elec- 


tronic component placed on your board 


If your processor does not support 
HSTL or SSTL memory types, you may 
need to select voltage referenced to 
CMOS translators. In high-volume appli- 
cations, these single-function translators 
can cost from $4 to $6 in 48-pin pack- 
ages. [he problem is, they only serve one 
purpose. If you don’t use all of the pins, 
it's wasted board space and power. With a 
single CPLD, you get translation coupled 
with extra logic capabilities and the free- 
dom to use pins for other purposes 


besides translation. 
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Figure 2 - Multiple functions in a single CPLD 


Even if you dont use any specialty 
memory, what about legacy parts that use 
different voltage levels than your processor? 
Again, you have the choice of purchasing a 
single function device that can cost around 
$2 for a 48-pin package. A comparable pin 
count CPLD can cost half as much as this 
single function device — and again, give you 
more functionality. So if you need voltage 
level translation in the range of 1.5V to 
3.3V, CoolRunner-II CPLDs can also pro- 
vide this integrated function. 

One specialty function that sometimes is 


not considered but may prevent board re- 
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spins is input hysteresis. Schmitt trigger 
inverters can cost from $4 to $8 in 20-pin 
packages. These devices usually operate from 
1.6V to 3.6V, which gives them a wide oper- 
ating window. CoolRunner-II CPLDs have 
input hysteresis on every input pin. And 
because you can configure CoolRunner-I] 
CPLD input buffers to any voltage from 
1.5V to 3.3V, they also have a wide range of 
operation. In a head-to-head comparison, 
CoolRunner-IT CPLDs can cost 75% less 
than a discrete Schmitt trigger device. 

Also, by using a CPLD solution, you 


can enable the input hysteresis, if required; 
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Design Services 
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if not, just leave it disabled. Because you 
don't always know if you need hysteresis, 
this flexibility may save a new board layout. 

Moreover, with features such as clock 
dividers and doublers (DualEDGE flip- 
flops), you can set up independent clock 
domains in CoolRunner-IJ CPLDs, thus 
eliminating the need for independent oscil- 
lators or crystals. The devices can handle 
fast-running sequential functions such as 
pulse width modulator, conversion func- 
tions (BCD to decimal), and serial com- 


munications functions. 


Conclusion 

Due to their multifunctional nature, Xilinx 
CPLDs can integrate many applications to 
save costs in your design. The high- 
performance, low-power CoolRunner-II] 
CPLDs can reduce the number of board re- 
designs, minimize the total number of 
devices, and increase overall flexibility. This 
will have a direct impact on bringing your 
product to market faster. 

To get you started with CPLDs, Xilinx 
offers multiple aids, including beginner 
tutorials with demo boards and reference 
designs that include detailed application 
notes with HDL code. Some design exam- 
ples include SMBus, I’C, SPI, and proces- 
sor interfaces. You can also look at full-up 
reference designs, such as designing an 
MP3 player. Whatever your level of experi- 
ence, Xilinx makes it easy to use repro- 


grammable logic. %& 
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We make your ideas work. 


Array Electronics has been providing 
advanced FPGA consulting services since 
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Reinventing 
the Signal 
Processor 


FPGAs are ideal for building high-performance, 
recontigurable signal processing systems such 
as sottware detined radios. 
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by Chris Dick, Ph.D. 

Chiet DSP Architect and Director 
Signal Processing Engineering 
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The ultimate goal in software radio has 
been the realization of an agile radio that 
can transmit and receive at any carrier fre- 
quency using any protocol, all of which can 
be reprogrammed virtually instantaneously. 

The Software Defined Radio Forum 
(SDRF) (www.sdrforum.org), an organiza- 
tion dedicated to supporting the develop- 
ment, deployment, and use of open 
architectures for advanced wireless systems, 
defines a software defined radio (originally 
coined by Joe Mitola in 1991 [1]) as radios 
that provide software control of a variety of 
modulation techniques. These include 
wide-band or narrow-band operation, com- 
munications security functions (such as 
hopping), and waveform requirements over 
a broad frequency range. 

Figure 1 shows the architecture of a 
generic software radio. Smart antenna array 
technology is used for both the receive and 
transmit paths in the system. On the receive 
side, multiple high-bandwidth digitized 
antenna data is channelized, converted to 
baseband, and filtered — typically the sample 
rate is adjusted at this node. Other sections 
of the radio’s physical layer (PHY) perform 
demodulation, synchronization, multiuser 
detection, adaptive interference cancellation, 
source decoding, forward error correction, 
beam forming, and adaptive equalization. 

All of these computations present signif- 
icant challenges for the radio PHY signal 
processing engine. Furthermore, much of 


the processing occurs at very high data rates. 


Demands for Configurability and Agility 


One of the driving objectives underlying 


SDR concepts is the desire to have a single 
hardware platform capable of servicing a 
number of radio environments. This type of 
reconfigurability could be used in several 
ways. For example, manufacturers develop- 
ing infrastructure equipment or network 
operators building out a network could 
deploy a soft radio system in Europe con- 


figured to support Universal Mobile 
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MUD = Multiuser Detection 


ADC Array 


ICU = Interface cancellation unit 


DSP Radio PHY 


Network 


FEC = Format Error Correction 


Figure 1 - Generic software radio architecture showing adaptive antenna array, analog signal processing, digitization, and the radio signal processing PHY. 


Telecommunications System (UMTS) or 
Global System for Mobile Communication 
(GSM) standards, or operate the system in 
the U.S. with a Code Division Multiple 
Access (CDMA)2000 radio personality 
profile. This one system could also be oper- 
ated as a multimode radio in an environ- 
ment that employs both wideband and 
narrowband CDMA communications. 
Radio agility is important in situations 
where standards are fluid. For example, con- 
sider the evolution of the 3GPP standard 
and the length of time required for that 
standard to stabilize. Agility is also impor- 
tant during transition periods. As we move 
from 2G to 3G mobile cellular systems, 
multiple standards such as Personal Digital 
Communication System (PCS), GSM, 
IS-95, Personal Handyphone System 
(PHS), DECT, EDGE, GPRS, IMT-2000, 
and CDMA2000 must all coexist. 
Multistandard support will be a fact of life 
for the foreseeable future. When the 4G 
wireless network build-out is completed, 
multimode operation will be required to sup- 
port third-generation wireless direct 
sequence spread spectrum (DSSS) and 
orthogonal frequency division multiplexing 
(OFDM), the modulation scheme most like- 
ly to be deployed in 4G systems. From a net- 
work operator perspective, base tranceiver 
station (BTS) configurability could be used 
to dynamically allocate radio resources. This 
might occur on the time-scale of hours in 
order to provide the highest quality of service 
to the subscriber base at any given time. 
Both manufacturers and network opera- 


tors could also use a configurable BTS to 
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permit field upgrades or bug fixes to equip- 
ment already deployed, by supplying a new 
BTS profile using the Internet or a 
microwave link from a radio network con- 
troller to the BTS. Soft radios can also be 
viewed as a means to protect infrastructure 
investments by keeping radio hardware from 
becoming obsolete as new standards and 


techniques become available. 


Economics and The Effect on Software 
Radio Development 
Although commercial technology and eco- 
nomics have always been inextricably 
linked, significant changes are occurring in 
both of these domains that will alter the way 
electronic equipment, including software 
radios, is developed and deployed. From a 
purely technological perspective, The 
International Technology Roadmap for 
Semiconductors (ITRS) shows that Moore's 
Law will remain in effect for at least another 
15 years, and that in the year 2016 devices 
will be produced on a 22 nanometer node. 
Yet Patrick Gelsinger, Intel’s chief tech- 
nology officer, announced at last year's 
International Solid State  Circuit’s 
Conference (ISSCC 2001) plans for a 20 or 
30 nanometer process in 2010, delivering a 
device consisting of 2 billion transistors 
operating at a clock frequency of 30 GHz. 
The estimated power consumption of the 
device would be 3 to 5 kW, or a power den- 
sity of 1 kW/square centimeter, about the 
same as a rocket nozzle. This has obvious 
thermal implications that must be dealt 
with using techniques radically different 
from today’s methods. 


Instruction set architecture (ISA) signal 
processors share many similarities to gener- 
al purpose processors. Architectural differ- 
entiates such as very long instruction word 
(VLIW), super-scalar extensions, and vari- 
ous types of predictive enhancements are 
really micro-architecture evolutions of the 
basic architecture credited to von Neumann 
and his colleagues in the 1940s and 1950s. 
As such, signal microprocessors have lever- 
aged most of their performance via a raw 
increase in clock frequencies. For example, 
in the early 1980s the first fixed-point sig- 
nal processors supported clock frequencies 
in the 5- to 10-MHz region. Current- 
generation high-end ISA Digital Signal 
Processors (DSPs) use 600-MHz clocks and 
are on a trajectory to the Giga-Hertz region. 
Obviously, this curve has the same thermal 
pitfalls described above. 

Although FPGAs take advantage of 
Moore’s Law (and other advanced process 
technology such as all-copper interconnect 
and low-K dielectric substrates) to provide 
increased clock frequencies over time, their 
primary mechanism for supplying perform- 
ance is completely different than the ISA 
approach. FPGAs exploit the large amount 
of parallelism inherent in most signal pro- 
cessing algorithms. With as many as 556 
embedded multipliers and 125,136 logic 
cells in the Xilinx Virtex-] Pro™ Platform 
FPGA, we can readily see how these devices 
can be viewed as a naturally parallel pro- 
cessing engine that can take advantage of 
the rich parallelism in a software radio PHY. 

The software radio PHY is a complex 


signal processing system in which algorith- 
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mic and functional-level 
parallelism can be lever- 
aged to realize a high- 


performance system that 


does not rely on raw speed Active super-fast 
for its performance. The interconnect 
multiplier array could be 

Synchronous 


used to implement space- 


time processing in a recelv- 





er, while at the functional 
level multiple turbo convo- 
lutional decoders could be 
Operating concurrently to 
support multiple users, 
each with a 2 Mbps data 
rate in a 3G environment. 

Manufacturers have 
made 60% more transis- 
tors available to circuit 
designers per area of silicon 
compared with what was 
available a year earlier. In contrast, the ratio 
at which designers are able to utilize transis- 
tors in circuits of any given tier of complexi- 
ty has only been increasing at a rate of 20% 
per year [3]. 

This “design gap” is associated with per- 
formance supply and demand, but another 
aspect is methodology related. It is becoming 
an increasingly complex, time-consuming, 
and error-prone procedure to develop and 
verify a sophisticated ASIC. Furthermore, at a 
cost of $1-$2 million for mask set costs, it is 
becoming prohibitively expensive. ASIC 
development timelines are now spanning 
years, and may even extend beyond the win- 
dow of opportunity for the intended product. 
School 
Clayton Christensen highlights that while 


Harvard Business Professor 
price and performance are still important, 
there are signs that a seismic shift is taking 
place, leading to a new era where other fac- 
tors — such as customization — matter more 
[4]. This is precisely where the FPGA fits in: 
it is the ultimate in customization. 

FPGAs address the technical as well as 
business perspectives outlined above. Because 
they are off-the-shelf commodity items, com- 
panies can access state-of-the-art device tech- 
nology with minimal NRE, and quickly 
build and deploy customized systems, achiev- 
ing very short time-to-market while simulta- 


neously maximizing first-to-market revenue 
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Figure 2 - Virtex-II Pro Platform FPGA showing the multiplier array for supporting 
parallel signal processing, multi-gigabit transceivers for inter-chip and inter-system 
connectivity, and embedded RISC processor technology for performing decision-oriented 


tasks and running a real-time operating system. 


streams. The Virtex-II Pro platform FPGA 
shown in Figure 2 is the cornerstone tech- 
nology for building high-performance 
reconfigurable signal processing systems — 
which includes the PHY in an SDR. In con- 
junction with the logic fabric and active 
interconnect, this device has an array of 
embedded multipliers for supporting the 
most demanding of arithmetic tasks in a 
radio PHY. This particular FPGA family 
also offers integrated Power PC 405 tech- 
nology, multi-gigabit transceivers and 
dynamic impedance matching capability on 
the device I/O ports that can be used to sim- 
plify printed circuit board design and man- 
ufacturing. Using a platform-based 
approach to system implementation, system 
designers can create product differentiates 
by implementing signal processing func- 
tions in the logic fabric as well as through 


embedded software running on a Power PC. 


The DSP Dilemma 


One approach to BIS implementation 
has been to employ a combination of ASIC 
and ISA DSPs. The ASIC technology is typ- 
ically used to address the significant arith- 
metic requirements of the radio front-end, 
such as digital down conversion and chan- 
nelization filters to support multicarrier W- 
CDMA or CDMA2000 standards. These 


functions are beyond the capabilities of even 


Embedded 
RISC CPU 








3.125 Gh Serial 


state-of-the-art ISA DSPs. 
Even though the ASICs used 
in this part of the system may 


offer some programmability, it 


= is generally limited in nature 
: and is certainly a departure 
d : from the intended philosophy 
Paasameeil of the fully configurable soft 
1/0s with LVDS radio. DSP processors might 


e used for certain baseband 
functions such as source (de-) 
encoding, as with CELP 
codec. Reduced Instruction 
Set Code (RISC) processing 
resources in the system could 
also support the requirements 
of the higher levels in the pro- 
tocol stack. 

From a soft radio perspec- 
tive, the ASIC/processor com- 
bination is poor partitioning 
from both a flexibility and efficiency stand- 
point. In recent years FPGAs have experi- 
enced hyper-growth in both arithmetic 
complexity and compute density (number of 
operations/unit area of FPGA) that can be 
achieved by current generation devices. What 
types of signal processing functions can be 
usefully realized by an FPGA? 

Radio designers working with FPGA 
technology implement IF sampled receivers, 
channelizers of different varieties including 
classical digital down (and up) conversion 
(DDC and DUC) architectures, FFT-based 
polyphase transforms, multistage multirate 
polyphase decimators and interpolators, 
adaptive interference cancellers for DSSS 
channels, multiuser detection (MUD), and 
take receivers (including acquisition and 
tracking). More recently, FPGAs have been 
used to construct space-time processors for 
advanced smart antenna systems. FPGAs are 
extremely adept and flexible at implement- 
ing FFTs, and this functionality has been 
used to construct OFDM modulators and 
demodulators. 

FPGAs have also found extensive use in 
narrowband bandwidth-efficient Quadrature 
Amplitude Modulation (QAM) systems. In 
this environment they have been used to 
implement adaptive channel equalizers, digi- 
tal timing recovery circuits, carrier recovery 


loops, frequency locked loops, and fractional 
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rate change filters. 


FPGAs are also exten- High MIPs Processing 
sively used for forward error in logic fabric SES 
correction in communica- Uy 
tion systems. For example, - . - 1 | Id 
ae emo : 
OC-3 155 Mbps Viterbi eS Radio PHY 
decoders, Reed-Solomon Demod-—-N= MAC (Media Access) 
decoding at OC-192 10 (ae 
Gbps data rates, and Viterbi Piel le 
; - Decision oriented tasks 
(de-)interleavers operating - CORBA 


at clock frequencies greater 


than 200 MHz are all 





502 
achievable with current gen- xX 3.125Gb Serial {ap CITE Connectivity to 
eration Virtex!-II FPGAs. Network : Impedance ~ other components 
FPGAs are the ultimate connectivity : - other FPGAs 


device technology in terms 
of user customization. They 
allow system architects to 
perform area-performance 
tradeoffs and to therefore 
“tight-size” the functional 
components in the system. FFTs with execu- 
tion times in the microsecond to tens-of- 
microseconds are possible. In the context of 
an OFDM communication system, a small 
number of FPGA resources could be used to 
realize a (de-)modulator that supports a mod- 
erate data rate, or by using more resources an 
extremely high-performance high data-rate 
link could be realized. 

With FPGA technology, control of the sil- 
icon is put back into the hands of the system 
developer rather than the chip architect — as 
is the case with an ISA signal processor. In 
fact, one way to view an FPGA is as a minia- 
ture silicon foundry with turnaround times of 
hours rather than months. 

Leveraging these types of tradeofts does 
not always mean that the engineering team 
has to construct the functional units from 
first principles. To facilitate rapid product 
development, many signal processing func- 
tions are available from the FPGA manufac- 
turers themselves and from third-party 
intellectual property (IP) suppliers. FFTs, 
multirate filters, Viterbi decoders, and Reed- 
Solomon encoders and decoders are all avail- 
IP from Xilinx 
(www.xilinx.com/xlnx/xil_prodcat_landing- 
page.jspititle=Xilinx+ DSP). 

One of the roadblocks to the widespread 


able as pre-verified 


deployment of FPGA-based signal processing 
has been design methodology related. In the 
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- Java Virtual Machine 
- NBAP 





even be used to run a Node B 
application protocol (NBAP) 
for a BIS, as a java virtual 
machine, or even provide 
CORBA support. 

Advances in analog-to- 
digital converter technology 
are still required to support 
the high dynamic range 
requirements of wideband 
radio front-ends that offer 
true multimode global oper- 
ability. Advances are also 


required in the area of config- 


Ea Control 


Figure 3 - Platform FPGA approach to software-defined radio realization. The high MIPs 
processing is implemented in the logic fabric, while decision-oriented and non-real-time 
tasks are provided as embedded software running on the Power PC. The multi-gigabit 


transceivers could be used for providing connectivity to the broader network. 


past, FPGA-DSP design has required signal 
processing and communication engineers to 
use tool flows and languages with which 
they are typically unfamiliar. The introduc- 
tion of tools like System Generator for DSP 
(www.xilinx.com/xlnx/xil_prodcat_product.js 
pititle=system_generator) has gone a long 
way to let engineers work in the language of 
the problem. In this case the system is 
developed using a visual dataflow paradigm 
in The Mathworks Simulink environment. 
The approach not only allows the design to 
be specified, simulated, and parameterized, 
but it also enables design reuse through the 


use of IP cores. 


The Reconstruction of the Software Radio 
The platform FPGA provides an opportuni- 
ty for the radio architect to reinvent the sys- 
tem. Instead of having a radio card that is 
responsible for the DSP heavy lifting at the 
front-end of a soft radio system, and then 
passing this partially processed data over a 
VME or PCI-X bus to a baseband processor, 
multiple functions could be integrated into 
one or a small number of platform FPGAs. 
As shown in Figure 3, compute-intensive 
tasks in the radio PHY could be imple- 
mented in the FPGA logic fabric, while 
more decision- and control-oriented tasks 


are run as embedded software on the Power 


PC™, This embedded processor could 


urable high-bandwidth ana- 
log signal processing for 
realizing the RF and IF stages 
of a radio. Micro-electro- 
mechanical systems (MEMs) 
appear to be a promising 
technology for addressing in- 
system configurable analog signal process- 
ing. As this technology matures and is 
combined on a single platform with digital 
functionality, the ideal of a completely con- 
figurable radio will move closer to reality. 

The significant computation demands of 
the SDR PHY have been largely satisfied by 
highly parallel signal processing platforms 
realized using recent generation FPGA tech- 
nology from companies like Xilinx. To com- 
plement the device technology, an increased 
emphasis on Signal Processing IP libraries 
and design methodologies such as System 
Generator for DSP are taking on renewed 
roles to provide a solution to the challenges 
presented by the software radio. 

(Note: A useful primer for engineers 
and executives interested in developing 
products in the SDR application space 
can be found on the SDRF’s web site, 
www.sdrforum.org/sdr_primer.html.) %: 
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Wireless communication has created a con- 
tinuing demand for increased bandwidth 
and better quality of service. With the ever- 


increasing number of mobile network sub- 


scribers, available capacity is becoming 


more of a premium. 

“Smart” antenna arrays are one way to 
accommodate this increasing demand for 
bandwidth and quality. These antenna 
arrays provide numerous benefits to service 


providers. However, the processing require- 





ments for smart antenna arrays are many 
orders of magnitude greater than those for 


single antenna implementations. 
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In this article, we will describe how 
smart antenna arrays work and present a 
new product from Nallatech™ that 
combines a 20-channel data acquisition 
system with an FPGA computing fabric 
for handling the high-performance digital 
signal processing (DSP) operations. We 
also show you how this combined prod- 
uct is integrated into a scalable system 
using Xilinx Internet Reconfigurable 
Logic (IRL™) technology for remote 
configuration and control of the system 
using Nallatech’s field upgrade systems 
environment (FUSE™,) software. 


Focus Power with Smart Antennas 

Figure 1 shows a conventional antenna as 
omnidirectional. It radiates and receives 
information equally in all directions. This 
equal distribution leads to power being 
transmitted to, but not received by, the 
user. This wasted power becomes potential 
interference to other users or to other base 
stations in other cells. Interference and 
noise reduce the signal-to-noise ratio used 
by the detection and demodulation opera- 
tions, resulting in poor signal quality. 

To overcome the problems associated 
with omnidirectional arrays, smart anten- 
nas focus all transmitted power to the user 
and only “look” in the direction of the 
user for the received signal. This ensures 
that the user receives the optimum quality 
of service and maximum coverage for a 
base station. An intermediate step to this 
ideal is using directional antennas that 
divide the 360-degree coverage into sec- 
tors. As shown in Figure 2, four direction- 
al antennas can each cover approximately 
90 degrees. 

Instead of using individual antennas, 
we can create a smart antenna array and 
add further processing intelligence to the 
data received or transmitted with this 
array. Smart antenna arrays enable us to 
direct beams in specific directions through 
electronic or software control. 

Two types of smart antenna arrays are 
switched-beam arrays and adaptive arrays. 
As shown in Figure 3, switched-beam 
arrays comprise a number of predefined 
beams. The control system switches among 


the beams and selects the beam that 
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Figure 1- Omnidirectional antenna 





Figure 2 - Sectorized antenna with four sectors 


provides the maximum signal response. 

Adaptive antenna arrays, on the other 
hand, incorporate more intelligence into 
their control system than do switched- 
beam arrays. Adaptive antennas monitor 
their environment and, in particular, the 
response of the data path between the user 
and the base station. This information is 
then used to adjust the gains of the data 
received or transmitted from the array to 
maximize the response for the user. With 
adaptive antenna arrays, the control sys- 
tem has full flexibility and determines 
how the gains of the arrays are adjusted. 
By adjusting the gains in this way, the 
control system can — in addition to maxi- 
mizing the gain from a particular user — 
also attenuate the signal from an interfer- 
ing source, such as from another user or 
from multipath signals. Therefore, as 
shown in Figure 4, adaptive arrays maxi- 
mize the signal-to-interference-plus-noise 
ratio (SINR) and not just the signal-to- 
noise ratio (SNR). 






This dynamic adaptation of the antenna 
array response provides focused beams to 
specific users and a new mechanism for 
multiuser access to the base station. 
Conventionally, multiple users are separat- 
ed when accessing the base station by 
using different frequencies, as in frequency 
division multiple access (FDMA). FOMA 
is used in advanced mobile phone services 


(AMPS) and total access communications 


systems (TACS). FDMA is also used in 


Antenna Array 


Figure 3 - Switched antenna array 
with active beam highlighted 


Targeted User 


Figure 4 - Adaptive antenna 
array enhancing the SINR 


time, as in time division multiple access 
(TDMA) for global systems for mobile 
communications, interim standard 136 
(IS-136), or code division multiple access 


(CDMA), which is used in third genera- 
tion (3G) systems. 


Xcell Journal /7 





Antenna Array 


Figure 5 - SDMA allows two users to access 
the same base station on the same frequency. 


As shown in Figure 5, by using smart 
antenna arrays, we can now use space divi- 
sion multiple access (SDMA). In this case, 
users may use the same frequency, time, or 
code allocations over the air interface and 
only be separated spatially. This enables 
SDMA to be a complementary scheme to 
FDMA, TDMA, and CDMA, and SDMA 
thus provides increased capacity within 


congested areas. 


Smart Antenna Processing 
A fully adaptive antenna array implementa- 


tion requires a considerable increase in pro- 


Antenna 2 


Antenna 1 


Antenna N 


Figure 6 - Data flow for beam-forming 
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cessing requirements. 
Previously, we had a 
single stream of data 
coming from a single 
antenna; now, we have 
multiple data streams to 
process. As shown in 
Figure 6, the data flow 
diagram for a beam- 
forming application is 
not a single input data 
stream. We now have N 
data streams that must 
be processed from the N 
antenna elements. 

The fundamental 
Operation carried out 
in adaptive arrays is to pass the data 
stream from each antenna through an 
adaptive finite impulse response (FIR) 
filter. Note that in narrowband applica- 
tions, the adaptive FIR filters simplify to 
a single weight vector. The processing 
requirements increase, however, with each 
beam processed. 

If we consider a simple example where 
we have four antennas and a narrowband 
system, such that the adaptive filters result 
in a single multiplication, we can see that 
the processing requirements approach one- 


half billion multiple accumulates (MACs) 





| Filter Weight Update | Reference Signal 


Array Output 


per second, for a sample rate of 105 mega 
samples per second. This sample rate is for 
a single beam and does not include the 
processing requirements for the adaptive 
update algorithm. This amount of process- 
ing does not seem unreasonable for per- 
formance in a DSP processor. However, if 
we want to support multiple beams and 
achieve finer beams by increasing the num- 
ber of antennas, we could quickly exhaust 
the processing capability of a standard 
processor architecture as we reach process- 
ing requirements of several billion MACs 
per second. 

By using FPGAs, we have powerful 
DSP devices for handling these high- 
performance requirements at sampled data 
rates. Furthermore, we can take advantage 
of the FPGA flexibility for directly han- 
dling acquisition control and other DSP 
functions, such as digital down-conversion, 


demodulation, and matched filtering. 


20-Channel Data Acquisition 
Figure 7 shows the Nallatech BenADIC™ 
data acquisition card, which can simultane- 
ously capture data from 20 sources at a sus- 
tained rate of 105 mega samples per sec- 
ond. The analog inputs have a 250 MHz 
bandwidth and the data is digitized at 14 
bits resolution. The card produces 3.675 
gigabytes of digitized 
data every second, or the 
equivalent of 5.4 audio 
CDs, for processing. 
The large number of 
tightly coupled input 
channels makes the 
BenADIC card ideal for 
processing smart anten- 


na arrays. As shown in 





Figure 8, the 20 input 
channels are partitioned 
into five groups of four 
channels. The analog-to- 
digital converters (ADC) 
in each of these groups 


are connected to their 


own Xilinx Virtex™™-E 


FPGAs. This enables 
local processing of the 
four channels. Thus, 


the architecture can be 
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Figure 7 - Xilinx Virtex-E FPGAs 
on a BenADIC 20-channel, 
14-bit data acquisition card 


arranged to handle five antenna arrays, 
each with four antennas within the array. 
Alternatively, the high-speed internal 
buses enable these groups to be intercon- 
nected to handle an array of 20 
antennas. 

In addition to the channel 
group FPGAs, a large FPGA can 
handle further processing and 
communicate with the compact 
PCI (cPCI) backplane and the 
PCI bus (via the interface 
FPGA). Communications over 
the cPCI backplane allow data 
transfer to other cards in the sys- 
tem, such as to the Nallatech 
DIME-II™-based BenADIC 
card, which can accept DIME-IT 
modules and provide more than 
50 million system gates with the 
Virtex-I] family for Xilinx 
XtremeDSP™ operations. 

The BenADIC card is 
compatible with the tools and 
cores produced by the Xilinx 
DSP group and includes the 
powerful System Generator 
tool. By using System 
Generator, you can develop 
and verify your algorithms 
within MATLAB™ = and 


Simulink™ 


You can then synthesize and 


environments. 


implement your design for the 
FPGA. This implementation 
exercise has been carried out 


successfully for the BenADIC 


using System Generator. 
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Xilinx FPGAs Allow for 
Software-Defined Radio 
The great thing about Xilinx FPGAs is their 
ability to be reprogrammed on the fly and to 
give hardware different personalities based 
on the application. Nallatech has been 
implementing dynamically reconfigurable 
FPGA systems for a number of years. The 
BenADIC card is Nallatech’s newest product. 
The BenADIC card is compatible with 
the Nallatech FUSE software environment, 
which provides the capability to selectively 
and dynamically change the operation of 
FPGAs in the BenADIC card or other 
FPGA-based systems, including modular 
DIME systems. This ability to dynamically 
update a system leads to the definition of a 
software-defined radio where the receiver 


characteristics are controlled via software. 


20 analog inputs 


By using the FUSE software, this control 
can be handled locally over a PCI or 
remotely via TCP/IP, for example. 


Conclusion 

The BenADIC card from Nallatech cou- 
ples the power of Xilinx FPGAs with a 
highly integrated 20-channel data acquisi- 
tion system on a single card. It is ideally 
suited for handling large, smart antenna 
arrays or several smaller arrays. 

The combination of the FPGA per- 
formance and flexibility enables the realiza- 
tion of advanced DSP algorithms, which in 
turn opens the possibility for deploying 
advanced wireless interfaces. Deploying 
advanced wireless interfaces provides users 
with a better quality of service and gives 


service providers greater capacity. %& 
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Figure 8 - Block diagram of a BenADIC card 
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FPGAs Have the Multiprocessing 


1/0 Intrastructure to Meet 
3G Base Station Design Goals 


by Peter Galicki, CEO 
CrossBow Technologies Inc. 
peter. galicki@crossbowip.com 


Two-dimensional fabric efficiently links 
arrays of processors inside Virtex-ll Pro 
With increased data traffic and new 


devices 10 enable parallel PFOCeSSING Of data. multiuser detection and adaptive beam- 
forming algorithms, data processing 

requirements of 3G base stations will 

increase by as much as 100 times relative 
to current equipment. This increase in 
processing capacity must be matched by 
low power consumption, as the new pico- 
cell base stations mounted on building 


' a ad E * = = i: 
ia = £ ; ims, LS 3 = = &, k 
r.2a st = ses “es, a es a, 
TEES 
ee ee ae ee ee RS ; 
——S Ss we Ohh Ul l elU US sides will not be using forced air cooling. 
Tv - i ir a ; uw a ale 
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ce = 7 in = rd F : 7 : hi. deel i 2 
— -_— = we | i i ee processors (Figures 1 and 2) will provide 
7 “or _ ew es E a i a al! a ral a a” ‘\ _ 
ee) a - i i i ee a power-efficient method of increasing 
= a = E& : a! . ‘ 
ee | a performance, more so than can be 
- -_ = aim — a, =— = . # ea Ei rs . . 7 : ; ; 
a. a * 3 ee a * oe obtained by increasing the features of 
a ae —_ Ey ie ral : al a __ | - 
a a i id a eS, =” Sl larger, general-purpose super processors. 
Ue i, a - = " = z ack t 6 
gt oll | ll a a — The evolution of current standards and 
ns ' —— —_ ; : 
a a ee a ee ee introduction of new standards currently 
seta, ™= ™ i a force base station operators to perform 
oes a = ZS 8 ; frequent upgrades to their wireless infra- 
| : — a : r- q PS 
= a he Er — am i structure, often requiring board replace- 
i 7 ; le i= ‘s 
y. "a » -_ a ments. To reduce field maintenance, 3G 
—_ x 7. i | aed ae = | pee Aes 7 A e 
— ~s Fae 2 a : equipment must be upgradable without 
aa = _ = _ bs 3 ‘ 
=, . oard swapping. The high cost of 3G 
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- a - > = equipment often leaves wireless infra- 
7 OF oe om a a 
— = =| = “> =; structure manufacturers with thin profit 
*_: = = = 2 margins; costs will have to come down to 
=o @ 
enable large-scale deployment. 


Finally, OEMs cannot abandon their 
current design methods to start designing 
3G equipment from scratch. They must be 
able to reuse semiconductor IP. code, and 
development tools to hit market windows 
and to obtain a return on investment. 
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Figure 1 — Arrays of small specialized processors reduce power consumption. 
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8 data operations per cycle 
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256 data operations per cycle 


Figure 2 — Arrays of small specialized processors increase performance. 


Meeting these seemingly exclusive goals 
requires a combination of top-notch process 
technology, combined with comprehensive 
component library and efficient data com- 
munication methods, inside-chip and chip- 
to-chip. Reaching 3G design goals hinges on 
achieving the right balance between the size 
and number of data processing components 
to keep most of the chip busy at all times, 
while reducing the overall distance that the 
data has to travel inside ICs. System efficien- 
cy is heavily influenced by design partition- 
ing, optimization of individual data 
processing components, and streamlining 
data flow between components. To keep data 
processing components busy, inter-processor 


data transfers must have low latency and be 
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precisely deterministic — otherwise compo- 
nents will waste valuable processing cycles 
while waiting for data. 

Low latency requires the removal of data 
communication bottlenecks by spreading 
the data flow over the entire area of a chip. 
Transfer determinism is achieved by a com- 
bination of low latency and a uniform data 


communications structure inside the chip. 


Data Processing Elements 

Each 3G chip is likely to contain hundreds 
of processing elements, many representing 
autonomous processors with their own data 
processing flows, control flows, memory, 
and communications ports. Some data pro- 


cessing flows may be augmented with dedi- 


cated DSP blocks. Virtex™-II FPGAs sup- 
port DSP functions and a MicroBlaze™ 
soft processor. Virtex-II Pro™ devices also 
feature embedded IBM PowerPC™ proces- 
sors. Depending on the task at hand, small- 
er processors may be better suited for 
simpler functions, and larger processors may 
be a better fit for more complex algorithms. 

In order to work in parallel, processors 
must be able to easily communicate with 
each other. An efficient way for processors 
to communicate is through a fabric dis- 
persed across the entire design that looks to 
individual processors like conventional 
memory (Figure 3). This approach enables 
each processing element to be developed 
and verified individually, yet easily 


exchange data with other processors. 


Two-Dimensional Data Communications 

An effective data interconnect fabric must 
support low latency and deterministic data 
transfers occurring simultaneously among 
multiple processing elements. It must also 
be flexible and scalable to allow for the 
addition of new elements or the removal of 
unwanted elements without affecting the 
rest of the design. Finally, it should be com- 
patible with existing processors and be as 


Ccasy to use as accessing Memory. 


Memory-Like Interface 

Using conventional bus cycles to transfer 
data between processors dispenses with 
exotic and hard-to-implement communi- 
cations peripherals and protocols in favor 
of a simple memory-like interface. As 
shown in Figure 4, 2D-fabric from 
CrossBow appears to processors as a mem- 
ory-mapped peripheral on an IBM 
CoreConnect™ bus. PowerPC and 
MicroBlaze processors can issue conven- 
tional read and/or write bus cycles to their 
local 2D-fabric peripherals to communi- 
cate with other processors on the chip 
(Figures 5 and 6). The payload for each 
transfer is derived from the data bus. The 
destination location and the initial direc- 
tion of travel are derived from the address 
bus. The transfers are totally transparent to 
the sending and receiving processors, 
launching transfers with write cycles and 


terminating transfers with read cycles. 
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Routing of data from source to destination, 
as well as arbitration with other data traffic, 
is performed autonomously by the inter- 


connected 2D-fabric peripherals. 


A 2D Array of Data Transport Links 
Efficient 3G designs will feature global 


communication fabrics using single sets of 


Peripheral PowerPC 


lines to transfer all kinds of data, including 
payloads, control words, and configuration 
data. Duplication of data transfer lines 
reduces overall system efficiency. 

As shown in Figure 7, 2D-fabric periph- 
erals of adjacent processors are interconnect- 
ed with a single mesh of horizontal and 
vertical data transport links. Individual bus 
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Figure 3 — Two-dimensional I/O fabric enables arrays of processors to work in parallel. 
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Figure 4 — Deterministic worst-case transfer latencies look like memory wait-states. 
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cycles are autonomously converted to small 
packets that travel between source processors 
and destination processors through chains of 
2D-fabric peripherals of the intermediate 
processors along the way. Short point-to- 
point links reduce power consumption. 
Small packets with single word payloads 
reduce data transfer latencies, enabling data 
and control packets to share common trans- 
fer lines. The same lines can also be used for 


system initialization and configuration. 


Scalability 
Scalability is an important requirement for 
the design effort and product field upgrades. 
Constantly changing standards may require 
adding or removing processors late in the 
design cycle or even after field deployment. 
In the past, adding or removing proces- 
sors has always been difficult when using 
centralized DMAs for movement of data. 
In any centralized I/O structure, removing 
or adding new components is likely to 
affect other system components. Two- 
dimensional I/O structures are much less 
sensitive to design changes. Adding anoth- 
er processor to a chip is as simple as wrap- 
ping it with a 2D-fabric peripheral and 
connecting the respective data transport 
links to the existing fabric. This can be eas- 
ily done without affecting any hardware or 


software already in place. 


Low Latency and Deterministic Data Transfers 
In computing environments where hun- 
dreds of processors are simultaneously 
exchanging data, how can you guarantee 
that any one of those transfers is going to 
arrive at its destination no later than a 
fixed amount of time? Buses, crossbars, 
and other centralized I/O structures force 
all data traffic through one central loca- 
tion, creating huge traffic jams. Two- 
dimensional I/O structures, however, can 
easily guarantee data delivery by spreading 
out data traffic across the design. As 
shown in Figure 7, a two-dimensional 
data transport grid dispersed across the 
entire design area removes communica- 
tion bottlenecks to allow individual trans- 
fers to complete on time, without 
interfering with other transfers. 


Individual processors must use worst- 
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case transfer latency when planning data 
transfers. Although it is acceptable for data 
to wait to be transferred, processors waiting 
for data are wasting precious processing 
cycles. Total transfer latency depends on 
the worst-case latency across one process- 
ing node and the number of intermediate 
processing nodes between the source and 


destination nodes. 


2D-fabric ready for the next write bus cycle ; TXI 


Packet exit direction and YX destination coordinates come from the write address bus 


Data payload comes from the write data bus a 


2D-fabric peripheral 
converts one bus write 
cycle to one packet 
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Worst-Case Latency Across 

One Processing Node 

A 50 ns packet latency across one node 
represents the time elapsed from when the 
packet started entering the node to the 
time when it started exiting that node. A 
packet delay time is the time from when it 
starts entering the node to the time when 


it completely emerges. Thus, a 100 ns 
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Figure 5 — Conventional write bus cycles launch data packets. 
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Figure 6 — Conventional read bus cycles receive data packets. 
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packet delay time is 50 ns latency plus 
another 50ns for the packet to fully emerge 
from the node. 

If packets exiting from a given output 
port can arrive from three different sources, 
the worst-case latency for any one packet is 
250 ns. This is equal to the best-case laten- 
cy of 50 ns plus two packet-delay slots of 
100 ns each. 


Worst-Case Latency Across 

Several Processing Nodes 

If the worst-case latency for crossing of one 
processing node is 250 ns, the worst-case 
latency for the entire transfer chain of two 
nodes, for example, would amount to 500 
ns. Thus, if a packet is launched from a 
source processor two nodes away from its 
destination, it will take it a maximum of 500 
ms to arrive at its destination processor, 
regardless of any other data traffic in the sys- 
tem (Figure 8). 


Total Latency 

Because 2D-fabric appears to processors as if 
it were memory, and because transfer laten- 
cy increases with the geographical distance 
from the source, processors can treat transfer 
latency as memory wait states for the pur- 
pose of scheduling the transfers. In a fully 
deterministic way, the further you go, the 
more wait states will be required to complete 
a transfer (Figure 4). 

Although actual latency for the above 
example is most likely to be closer to the 
best-case latency of 100 ns, the worst-case 
latency should always be used when plan- 
ning data transfers between processors. In 
some I/O fabrics, worst-case latency can be 
further reduced by launching packets in spe- 
cific routing directions to avoid interference 
with other packets, thus reducing the num- 
ber of packet delay slots from two to one, or 
even down to zero. 

As shown in Figure 4, 2D-fabric allows 
processors to easily determine the worst- 
case transfer latency for any destination 
inside the chip by simply counting the 
number of intermediate nodes. 2D-fabric 
also enables packets to be launched in any 
one of four possible directions by encoding 
exit directions in the address field of each 


data write cycle. 
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Figure 7 — Non-blocking multiple simultaneous data transfers 
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Conclusion 

Two-dimensional inter-processor interfaces 
enable fast, easy, and efficient data commu- 
nications among hundreds of data processing 
elements of 3G functions implemented 
inside Virtex-II FPGAs. In addition to 3G, 
two-dimensional I/O also benefits voice- 
over-packet, routers, medical imaging, radar, 
and sonar applications. Linking processing 
elements with 2D-fabric increases system 
performance by enabling multiple processors 
to process data in parallel. At the same time, 
2D-fabric reduces power consumption by 
minimizing the total distance that data has to 
travel inside chips. 

And because it looks to the processors like 
conventional memory, 2D-fabric does not 
force system programmers to change their 
programming methods to benefit from high- 
er performance. Serial programming code 
investment is preserved, because each pro- 
cessing element has only one processor. 

Finally, system designers can now drasti- 
cally increase processing throughput and I/O 
bandwidth while retaining current processor 
architectures and design tools. 

For more information on the 2D-fabric 
parallel-processing interface, go to 
www.xilinx.com/products/logicore/alliance/ 


crossbow/crossbow.htm. & 








Figure 8 — Latency can be determined by counting intermediate nodes. 
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Bluetooth Wireless Technology 
Gets BOOST Lite Processor 


f \/j E ue l\ For easy and fast access to Bluetooth 
O] Tarey\ S wireless technology, NewLogic offers the 


, ) for Aliny Virtex FPGA products 





by Yan Siang Goh 
NewLogic Technologies, Inc. 
yan.goh@newlogic.com 


In recent months, many Bluetooth™- 
enabled products have been slowly appear- 
ing in the market. The proliferation of 
these products strengthens the acceptance 
of Bluetooth wireless technology in the 
market and as a global standard for short 
distance wireless communication. 

The BOOS Wie! ™ Mascband proces- 
sor for Bluetooth technology is designed 
for easy integration into the Xilinx 
Virtex™ FPGA family. The BOOST Lite 
processor is based on the popular BOOST 
Core™ processor and is complemented 
with the Bluetooth protocol stack and 
BOOST software to implement a complete 


Bluetooth wireless system. 
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BOOST Lite Core Features 


¢ Based on BQB version 1.1 qualified 
BOOST Core processor 


¢ Interface to most major Bluetooth 


radios in the market 


¢ Supports co-existence with 802.1 1b, 


piconet, and scatternet operation 
¢ Supports Bluetooth low power modes 
¢ Supports all data and voice packet types 
¢e ARM™ processor AMBA-type interface 


¢ Optimized for the Xilinx Virtex family. 


BOOST Lite Architecture 

The BOOST Lite core has a fixed bus 
interface to an external ARM processor 
and external Bluetooth radio (see Figure 
1). Some external RAM and ROM (as 
well as EPROM, EEPROM, and so on) 
are necessary to host the BOOST soft- 
ware, which is the Bluetooth protocol 
stack. The CVSD encoder (available as an 
option) and a voice coder are necessary to 
support voice operation. For data applica- 
tions, it is possible to input/output a data 
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Figure I - BOOST Lite architecture 


stream from a UART. 

The BOOST Lite core 
interfaces to a fast processor 
bus or the AMBA-type bus. 


This bus ensures that data is 


a8 


moved quickly between the 





processor and the exchange 
memory, which is accom- 
modated internally on the 
Xilinx Virtex FPGA. 

From the architecture, 
you can see that the BOOST Lite processor 
is designed as an “out of the box” Bluetooth 
wireless solution. The BOOST Lite proces- 
sor can be integrated with any other third- 
party system, such as a printer or test 
equipment core, that requires Bluetooth 
functionality. 

Furthermore, you can use the BOOST 
Lite processor for fast prototyping before 
committing to a ASIC/ASSP design cycle. 
This enables an easy progression from an 
FPGA prototype to a silicon solution. 

NewLogic offers an option for BOOST 
Lite users to upgrade to the BOOST Core 
processor as a full, source code version of a 


Bluetooth baseband processor. 





External ARM7TDMI 
processor 








Figure 2 - BOOST Lite development board 


A BOOST Lite development board is 
also available (see Figure 2). The board is 
supplied with an ARM7TDMI™ processor, 


external Bluetooth radio, and a Xilinx Virtex 


FPGA as the BOOST Lite core. 


Conclusion 

With the availability of BOOST Lite 
processors for the Virtex FPGA family, the 
process of designing a Bluetooth-enabled 
system using Xilinx Virtex FPGAs is 
greatly simplified. For more information 
on the BOOST™ family and the 
WiLD™ 802.11 WLAN technology 
family, visit NewLogic’s website at 


www.newlogic.com. %& 
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Decode MPEG-2 
Video with 
Virtex FPGAs 


Amphion’s CS6651 video decoder 
enables the decompression ot 
video streams in real time. 
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by Rick Richmond 
Senior Design Engineer 
Amphion 
rick@amphion.com 


MPEG-2 is the digital video paradigm of 
today. It is at the heart of the Digital Video 
Broadcast (DVB) and Advanced ‘Television 
Systems Committee (ATSC) standard as 
well as high-definition digital television 
systems and DVD-video, which has seen 
incredible market growth in recent years. 

The widespread adoption of these appli- 
cations and systems, coupled with consider- 
able investment by broadcasters and dis- 
tributors, indicate that MPEG-2 is going to 
be around for a good while to come — 
despite the emergence of new, even better 
video compression algorithms. Future con- 
sumer digital applications, in which audio, 
video, and data networking technologies 
converge, are certain to need built-in 
MPEG-2 video capability. 

Yet MPEG-?2 video is fairly complex and 
computationally intensive to decode. The 
main features of the algorithm are discrete 
cosine transform (DCT)-based compres- 
sion and motion estimation techniques. 
Until now, decoder implementations for 
digital STBs (Set Top Boxes) and DVD- 
video players had been the domain of ASIC 
implementations or software running on 
very powerful processors. 

As the sophistication of products like 
STBs grows, designs will require ever-increas- 
ing flexibility and ever-decreasing develop- 
ment time scales. Now, using the Xilinx 
Virtex™ series of FPGAs and a new intellec- 
tual property (IP) core from Amphion, 
building MPEG-2 video into your designs is 
simple. Amphion’s CS6651 video decoder, 
which incorporates an integrated external 
SDRAM memory controller and display 
direct memory access (DMA), is a great 
example of the Platform FPGA capability of 
the Xilinx Virtex series. This solution allows 
decoding of MP@ML MPEG-?2 video with 
NTSC or PAL frame rates and resolutions. 


The MPEG-2 Video Algorithm 
MPEG-2 MP@ML video provides a gener- 


ic video compression solution for applica- 


tions such as satellite, terrestrial, and cable 
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television (in DVB and ATSC formats), as 
well as optical storage (DVD-video) at 
NTSC, PAL, and SECAM resolution and 
frame rates. 

MPEG-2 video sequences are composed 
of three different types of pictures: 


eIntra coded pictures (I-pictures), 
DCT- 


which are compressed using 


based techniques 


¢ Predictive coded pictures 
(P-pictures), which use motion 
compensation to predict the 
current picture from a past ref- 


erence picture 


¢ Bidirectional predictive coded 
pictures (B-pictures), which use 
predictions from past and 
future reference pictures. B- 
type pictures are not themselves 


used as reference pictures. 


DCT-based Compression 

In MPEG-2 MP@ML video, 
each picture is broken down 
into 16x16-sized blocks of lumi- 
nance samples, called macro 
blocks. These blocks are further 
divided into 8x8 blocks. Each 
macro block has six blocks in 
total: four luminance blocks and two sub 
sampled chrominance blocks (a 4:2:0 
chrominance format). In the encoding 
process, a two-dimensional, 8-point 
DCT is applied to each 8x8 block; the 
resulting coefficients are quantized using 
a 64-element quantization matrix. 

This process reduces amplitude and 
increases the number of zero-value coeffi- 
cients. The quantized DCT coefficients are 
reordered in a zigzag fashion into a one- 
dimensional stream, effectively grouping 
together runs of zero-valued coefficients 
interspersed with non-zeros. This stream of 
run-level pairs is then encoded using 
variable codes 


Huffman-style length 


(VLCs) based on a statistical model. 


Motion Compensation 

In predictive and bidirectionally predictive 
coded pictures, each macro block may have 
a number of pairs of motion vectors. These 


specify the horizontal and vertical displace- 
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ment from the current position at which 
the stored reference picture best resembles 
the current macro block. The difference (if 
any) between the motion-compensated 
prediction and the actual image is coded 
using the DC-based techniques described 
above. The sum of the predicted samples 
and the prediction error give the final 


reconstructed macro block. 





Decoding MPEG-2 Video 
As devices and applications grow in com- 
plexity, the Amphion CS6651 MPEG-2 


video decoder IP core can be implemented 


by taking advantage of the capabilities of 
Xilinx Platform FPGAs. 

As part of a demonstration system, the 
core is implemented in a Xilinx Virtex 
XCV800 device. Including extra interfac- 
ing glue logic and PAL/NTSC video 
encoder driver logic, the core consumes 
fewer than 8,000 slices and 26 block 
RAMs. The implementation benefited 
greatly from the following features of the 


target device: 
e Ample high-performance block RAM 
e Fast I/Os 
e Extensive logic resources. 


The functional blocks and simplified 
interfaces of the Amphion CS6651 
MPEG-2 video decoder IP core are 


shown in Figure 1. The core requires a 
minimum clock speed of 27 MHz to 
maintain MP@ML decoding rates. Video 
elementary streams are accepted into the 
core via the byte-wide ES_Data input 


port on the elementary stream interface. 


Parser 

The front end of the core is the video ele- 
mentary stream parser. It searches 
the syntax of the incoming stream 
for start codes at which decoding 
may commence. The parser extracts 
the various encoding parameters 
from the headers, which are used to 
direct subsequent decoding. The 
remaining variable-length encoded 
picture data is passed onto the VLC 


decoder. 


VLC Decoder 

Here, the Huffman-style VLC 
picture data is decoded. The out- 
puts of this block include DCT 
block run-level codes and motion 
vectors for motion compensation 


of each macro block. 


Run-Level Decoding and Inverse 
Quantization 

The run-level decoder converts run-level 
codes from the VLC decoder into com- 
plete blocks of 64 quantized DCT coeffi- 
cients. These coefficients are then con- 
verted from zigzag scan order to natural 
row order before being dequantized. 
Virtex block RAMs are used to support 
the scan conversion operation and the 
storage of custom quantization matrices, 
which may be sent in the elementary 


stream headers. 


Inverse DCT 

This unit performs the computationally 
intensive inverse DCT (IDCT) on 
the dequantized 8x8 blocks of DCT coef- 
ficients. Making use of the high-speed 
on-chip block RAM, the IDCT unit is 
capable of streaming data continuously, 
transforming in 64 clock cycles an 
8x8 block of DCT coefficients into an 
8x8 block of luminance or chrominance 


samples or prediction errors. 
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Motion Compensation and 
Picture Reconstruction 

For each macro block in a 
P-picture or B-picture, the 
motion compensation unit 
takes the decoded motion 
VLC 


decoder and translates them 


vectors from. the 
into row and column coor- 
dinates for the prediction 
samples in the reference 
picture. The frame store 
memory then requests these 
samples and retrieves them 
via the SDRAM interface. 
The samples are combined 
with other prediction sam- 
ples, if necessary, to com- 
plete the prediction for the 
macro block. 

The final stage of decod- 
ing is to add the prediction 
samples to the prediction a 
error corrections from the 7 
inverse DCT unit and write 
the reconstructed samples 
into the frame store memo- 
ry. If a block has no predict- 
ed samples, then the sam- 
ples from the inverse DCT 
are the final samples and are 
passed straight through. 
Both motion compensation 
and picture reconstruction 
employ Virtex block RAMs for buffering 
predictions, samples, or prediction errors 


and final reconstructed samples. 


SDRAM Interface 

The frame store memory can be imple- 
mented using a commodity PC100 or 
better 64-Mb SDRAM part. The 
SDRAM interface handles the mapping 
of row and column motion compensation 
prediction requests and reconstructed 
sample writes into linear memory 
addresses. Motion compensation places 
particular memory bandwidth demands 
on the decoder implementation. To 
achieve adequate decoding performance, 
this unit must then arbitrate between the 
other functions, such as the display 


DMA, which also access the frame store. 
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Figure 1 - Simplified block diagram of Amphion CS6651 MPEG-2 video decoder IP core 


Display DMA 

The display DMA unit retrieves decoded 
samples from the frame-store memory line- 
by-line for display. This unit has a config- 
urable double-byte output interface for 
luminance and chrominance samples and 
can also perform chrominance upsampling 
to a 4:2:2 format in the vertical direction. 
This interface provides a number of hand- 
shake signals and flags (not shown in 
Figure 1) to easily allow for the addition of 
extra logic, to create sync pulses suitable for 
connecting the decoder to a NTSC or PAL 


video encoder chip. 


Host Interface and Control 
Access to internal control, status, and 
video stream parameter registers within 


the core is provided via the host interface. 


Simple 32-bit read/write access to the 
frame store is also available. In the 
demonstration system, a host processor 
controls the core to perform special 
effects modes such as pause and fast- 


forward in response to user commands. 


Conclusion 
MPEG-2 video decoding promises to be 
an important feature of many future 
products. In addition to the Amphion 
CS6651 MPEG-2 video decoder IP core 
implemented on a single Xilinx Virtex 
Platform FPGA, Amphion is developing 
even more sophisticated FPGA-based 
decoders. 

For the latest information about 
Amphion’s advanced decoders, visit 


www.amphion.com/video.html. & 
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Bugnunters @ Siemens 


Siemens has developed a powertul diagnostic tool tor its high-volume 
elephone switches. Advanced rrGA technology and high-productvity CUA 
(Ools trom Allinx Nave enabled sophisticated test str pas. 





Most people are unaware that hundreds of 
microprocessors are involved in placing a 
cellular telephone call. Although this may be 
a surprising fact for the average cell phone 
user, mobile communications experts are 
even more astonished that such a complex 
system is stable and seemingly robust. 

For despite a well-defined development 
methodology and quality assurance meas- 
ures, bugs are sure to exist in any micro- 
processor code or software. Particularly 
annoying are sporadic bugs, which show up 
only under special, rare circumstances at 
intervals as long as months apart. Typically, 
such bugs are associated with the dynamics 
of the multiprocessor system. 

Of course, there are plenty of debug 
tools to analyze software, but applying 
them may change the dynamics of the sys- 
tem and can therefore mask the problem. 

This is where the Siemens Hardware 
(HW) Tracer comes into its own in identi- 
fying software bugs before they can cause 
problems in the field. Based on Xilinx 
Virtex!M-E FPGAs, HW-Tracer is a stand- 
alone, rack-based data capture, analysis, 


and debug tool (Figure 1). 


Hardware Tracer 

The HW-Tracer’s task is to acquire, filter, 
and qualify all bus cycles and additional 
monitor signals. These signals can be 
accessed via interfaces on the front panels 
of every processor and memory board and 
are recorded in trace memory together with 
a timestamp that is 2 million cycles deep. 

The tracer can log the complete activity 
of the system in real time without interfer- 
ing with its operation. Moreover, it can also 
trigger specific events due to a tight cou- 
pling with the Coordination Processor’s 
(CP) operating system. 

HW-Tracer is essentially a logic analyzer 
with many proprietary extensions. Because 
of the number, high speed, and complexity 
of these extensions, we developed the HW- 
Tracer as an in-house tool for Siemens. For 
example, it tracked processes (whose address- 
es were determined at runtime) and com- 
piled a view containing only program jumps. 

To meet the demanding requirements of 
the new CP generation, HW-Tracer has to 
cope with a peak data rate of 2 gigabytes 
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Figure I - Siemens “Bughunter” HW-Tracer with Virtex-E XCV2000E FPGA 


per second. Actual cycle rates can even be 
higher, as data bursts are transferred in a 
compressed format. 

To use the HW-Tracer’s two channels 
independently for processor or memory, we 
dynamically reconfigured the FPGA that 
acts as the heart of the tool. Reconfiguring 
the hardware to complete two different 
trace tasks halved the cost of the hardware. 


The Technical Challenge 
A number of key design considerations 
decided the final hardware implementa- 
tion. Because of their reprogrammability 
and dynamic reconfigurability, we selected 
Virtex-E FPGAs. 

The tasks the HW-Tracer had to per- 
form included: 

e Analyze nearly 200 input signals at a 
system speed of 75 MHz 


¢ Capture a high number of 32-bit buses 
with a data path as wide as 144 bits, 


which would stress routing resources 


¢ Cope with signal integrity issues on the 


board level. 


Power aspects also had to be taken into 
account in the overall design. 

We conducted a feasibility study before 
HW-Tracer. 
Specifically, we investigated whether FPGA 


we developed the new 


technology was capable of dealing with the 
necessary system performance. Our study 


showed that the Xilinx Virtex-E family of 


FPGAs was the only technology available 
that could meet the requirements. We 
chose Virtex-E FPGAs because: 
e Virtex-E devices have up to 804 user 
I/Os and a 130 MHz internal per- 


formance level. 


¢ There were more than enough rout- 
ing resources to capture the 32-bit 
busses with widths as wide as 144 
bits. With densities ranging from 
58K to 4M system gates, a cascade 
chain for wide-input functions, and 
dedicated carry logic for high-speed 
arithmetic functions, the Virtex-E 


FPGAs were ideal for this task. 


e The programmable SelectIO™ 
standards on Virtex-E devices were 
able to handle signal integrity issues 
at the board level, supporting 20 
high-performance interface standards 
with as many as 804 single-ended 
I/Os or 344 differential I/O pairs. 


The fast, high-density, 1.8V Virtex-E 
family is designed for low-power 


operation. 


The circuit design took a massive 


pipelining approach, mathematically 
speaking. In addition to a design complex- 
ity of about 400K gates, which was main- 
ly determined by the logic in the data 
path, there was also complex control logic 


to be implemented. A unique feature in 
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Xilinx FPGAs is a mode called SRL16 
(Shift Register LUT), which can be used 
to increase the effective number of flip- 
flops per configurable logic block (CLB) 
by a factor of 16. (Adding flip-flops 
enables fast pipelining.) 

The state machines required to control 
the operation modes of the HW-Tracer 
could not be pipelined and threatened to 
limit the achievable system speed. We 
have found, however, that arithmetic per- 
formance is not the bottleneck in today’s 


leading-edge FPGAs. 


Choosing the Design Flow 

Despite the extreme performance needed, 
we decided to code the design in VHDL, 
which is technology-independent at a high 
level and controls the implementation via 
just the constraints in the synthesis and 
place-and-route tools. We have successfully 
used this automated design flow already in 
the past for several FPGA designs. It pre- 
vents design issues and saves time when 
modifying the design. 

Only the memory modules had to be 
instantiated; using the Xilinx CORE 
Generator™ tool, this proved to be quick 
and easy to implement. 

It was clear that we would have to utilize 
simulation as if we were developing an 
ASIC. Simulation (including a gate-level 
simulation with the VHDL output of 
Xilinx Alliance Series™ software) added to 
the overall design time but paid off at the 
end, when the FPGA was delivered with 


only two very minor bugs detected. 


Striving for Timing Closure 
We selected Mentor Graphics’ Leonardo 
Spectrum™ software to synthesize the 
design and Alliance Series software for the 
back-end task. Both tools enabled us to 
implement a completely automated 
design flow. This was important because 
in past designs, timing closure could not 
be guaranteed; code changes tended to 
result in new worst-case paths that violat- 
ed the tight timing constraints. We have 
experienced this timing constraints prob- 
lem when designing with ASICs. 

The most recent releases of the 


Leonardo Spectrum synthesis tool address 
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this issue successfully by interacting with 
Xilinx Alliance software, and thus, reduc- 
ing the number of design iterations. We 
have seen improvements on the critical 
paths as high as 15%. 

We carried out many implementation 
runs to incrementally move toward our 
performance goal. In our project, we 
started the development based on prelim- 
inary timing data; later updates of the 
timing data entailed some refinements of 
the constraints. 

We benefited a great deal from the sta- 
bility and performance of Xilinx design 
software. We achieved turnaround times of 
three to four hours for the whole imple- 
mentation task, which is excellent when 
taking into account the complexity and 


tough timing constraints. 


Management Views 
The CP’s bring-up-phase depended heavily 
on the availability of HW-Tracer, which 
was developed in parallel. To address this 
risk, we added several additional verifica- 
tion steps to the process. 

The test equipment included in the 
VHDL system simulation comprised: 


e Two main memory units, each with a 
million-gate ASIC containing two 
embedded CPU-cores 


¢ Several different processing units, 
each with a million-gate ASIC and 
two CPUs 


¢ Up to 16 peripheral models and an 
ATM-controller. 


In the context of the system simulation, 
the CPU’s firmware was already verified. 

“The verification of the HW-Tracer at 
the virtual, simulated level ensured that it 
could be immediately used for testing the 
physical prototype of the CP unit. This is a 
great advantage over ASIC designs. Using 
FPGA simulation tools shortened our 
design time greatly,” said Johann Notbauer, 
Siemens CES Design Services’ technical 
director for ASIC and FPGA design. 

Friedrich Wilhelm, technical director 
and specialist for proprietary test tools at 
Siemens, added, “The high logic densities, 


flexible high-performance interface stan- 


dards, and pipelining capabilities of Virtex- 
E combined with the powerful design tools 
made the choice of FPGA easy.” 


Emulation 

The application software of the HW- 
Tracer had to be thoroughly verified before 
its first use. In this environment, the 
Tracer design was stimulated by a pattern 
generator, which was included into the 
FPGA design using internal Virtex-E 
block memories. The patterns were again 
derived from the system simulation. In the 
final product the pattern generator is used 
for a “learn mode,” which helps users 
familiarize themselves with HW-Tracer 
before being connected to an actual CP 


unit for test and debug. 


Internet Reconfigurable 
In order to facilitate troubleshooting on 
any of the switches installed throughout 
the world, the HW-Tracer was designed as 
a portable device and squeezed into a 3U- 
CompactPCI chassis. 

The FPGAs were loaded by the 
embedded PC, which took the configura- 
tion data from the hard disk. Embedded 
software as well as FPGA design files are 
accessible on a dedicated homepage — 


where future updates and upgrades can be 


downloaded. 


Conclusion 
The powerful combination of Mentor 
Graphics’ Leonardo Spectrum synthesis 
with Alliance software and Virtex-E hard- 
ware from Xilinx enabled Siemens to 
bring its new range of telephone switches 
to market. Virtex-E FPGAs were used at 
the heart of the HW-Tracer, which was 
used to debug and test the Coordination 
Processor project. The software combina- 
tion allowed FPGA design iterations to be 
completed quickly without affecting the 
tight timing constraints. We chose Virtex- 
E devices for their high logic densities, 
high system speeds, flexible system I/O, 
and ability to perform fast pipelining uti- 
lizing the SRL16 mode. 

As one customer put it, “The HW- 
Tracer is the most important debug tool in 


the CP project.” 
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Answers You Need 
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You Need Them 


Need help? Simply go 
fo support.xilinx.com on 
the World Wide Web. 


by Doug Horne 

Product Marketing & Web Development 
Global Services Division 

Xilinx, Inc. 

Doug.Horne@xilinx.com 


A robust, comprehensive, online resource, 
support.xilinx.com stands ready to answer all 
your design questions 24 hours a day, 7 days 
a week, 365 days a year (366 in a leap year). 
With customized access to such features as 
Answers Search, Software Manuals, Tech 
Tips, Forums, Problem Solvers, 
techXclusives, WebCase, Software Updates, 
and Agents, you can get the answer you 


need when you need it (Figure 1): 


e Answers Databases: Use our advanced 
search or answer browser tools to easily 
access the latest answers to technical issues 
from our huge database of more than 


4,000 answers, indexed by logical category. 
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Software Manuals: View manuals in 
PDE compressed PDE, or in conven- 


ient search-enabled HTML format. 


*Nech Mipsy Gerahe latesy techincal 


information about development 


tools, device families, interface 


tools, and Virtex-IJ Pro™ FPGAs. 


Forums: Collaborate with other 
designers in discussion groups or 


chat rooms; join the popular news- 


group comp.arch. fpga. 


Problem Solvers: Get instant help 
with installation and configura- 
tion, PCI applications, and JTAG 
implementation. This interactive 
tool uses a series of questions to 
diagnose and troubleshoot your 
configuration or installation prob- 
lem automatically, saving you 


hours of work. 


techXclusives: Expert Xilinx appli- 
cation engineers will keep you 
informed of the hottest issues and 


latest techniques. 


e WebCase: Use our WebCase to 


manage hotline cases. You can 
check status, add notes, and even 


close a case -on the Web, anytime. 
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e Latest Answers: View the answer 


records you visited, make note of 
the most recent answers published 
by Xilinx engineers, and see what 


visitors are saying. 


¢ Tech Tips: This is the best resource 


for hot issues and tips that will get 
you up and running quickly. 


¢e My Bookmarks: Create links to 


your favorite websites and Answer 
Records, or search Queries on sup- 


port.xilinx.com. 


¢ Stock Quotes: Watch your favorite 









stock ticker symbols, latest price, 


change, and last trade information. 


Do It Today 

Take advantage of the powerful 
tools at your disposal by visiting 
support.xilinx.com today. To sign 
up for MySupport.xilinx.com, visit 
mysupport.xilinx.com, or click on 
the MySupport logo on the 
page. 
ilesuicms cane CenGIncetCCRMEG 
Doug Horne at 408-626-6317 or 


doug. horne@xilinx.com. %& 


support.xilinx.com home 
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Learn Smarter, 
learn Faster 


With its new Designing. fo 


—Pestormance Lie Otlne, cou se 


over the Web. Leam if your desk 
from trained specialists = live online. 


by Cindy Andruss 

Technical Writer/Production Editor 
Xilinx, Inc. 
cindy.andruss@xilinx.com 


Reduced training budgets, travel restric- 
tions, and explosive advances in program- 
mable logic technology make it harder than 
ever to stay on the high side of the learning 
curve. Combined with demands to do 
more in less time, your job as a design 
engineer — or as a manager of a design team 
— requires an innovative training solution 


that will save you time and money. 
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In addition to developing state-of-the- 
art logic devices and software, Xilinx also 
continues to pioneer educational oppor- 
tunities that reduce engineers’ time to 
knowledge and increase their proficiency 
in using Xilinx FPGA design tools. The 
latest offering from Xilinx Education 
Services, Designing for Performance Live 
Online, is an education package that 
combines the best of live instruction with 
none of the inconveniences and lost 
opportunity costs of travel to off-site 


training centers. 


Learning Without Luggage 

Design engineer Lukose Ninan was just get- 
ting into Xilinx FPGA design when he found 
out about our first online FPGA training, e- 
Series I. “I wanted a refresher course,” said 
Ninan of ComSonics Inc. “e-Series Tis exact- 
ly what I wanted to get up to speed on Xilinx 
design, and I got the information I needed 
without having to travel.” 

When Mike Schell of Convergent 
Design Inc. learned about the online pro- 
gram, he enrolled immediately. “The num- 
ber one reason I prefer e-learning classes is 
convenience,” said Schell, a design consult- 

nt. “I have deadlines to meet, and it would 
be lost time if I had to travel to a class.” 

Ninan and Schell are among a growing 
number of engineers and managers who 
use online e-learning programs to acquire 
or enhance existing skills (see “Benefits for 
Both Managers and Engineers”). According 
to a report from Merrill Lynch, employees 
in more than half of U.S. corporations used 
e-learning training programs in 2000. Here 
at Xilinx, online course enrollments have 
steadily increased since our initial e-learn- 
ing offerings began in 2000. 


Virtual Education 

The Designing for Performance Live Online 
package consists of five one-hour lecture 
modules and four two-hour lecture-and- 
lab modules (see “Designing for 
Performance Live Online Modules”). This 
series of modules was selected from the 
popular Designing for Performance course. 
Each module is delivered live on the 
World Wide Web. 

The entire package is available for $900 
USD or nine training credits, and you can 
register and pay for the package online. If 
you prefer, you may purchase modules 
individually. 

The modules are scheduled sequentially, 
two per week, over a five-week time period. 
Some students say these smaller units of 
content provide a more lasting learning 
effect compared with the average content 
retention rate for intensive, all-day seminars. 

“It’s not necessary to have all the infor- 
mation in an eight-hour class,” Schell said. 
“That's what I really like about the pro- 


gram. You have a day or two to absorb the 
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Benefits for Both Managers and Engineers 


As a manager, you will benefit from Designing for Performance Live Online it: 


¢ You plan to hire engineers throughout the year and would like to offer this 


packaged training to new hires as they come on board. 


¢ You do not have a sufficient number of engineers at any one time to warrant 


an on-site learning program. 


¢ You do not want to send the engineers away for training because of time or 


budget restrictions. 


¢ You want accessible, consistent, and affordable training for your engineers who 


have experience with Xilinx ISE software tools but who need to enhance their 


knowledge of FPGAs. 


As a design engineer, you will learn how to: 


¢ Use synchronous design techniques to improve performance. 


¢ Design synchronization circuits to improve design reliability. 


e Write HDL code to efficiently target Virtex-II architecture resources. 


¢ Generate customized cores using the CORE Generator™ system. 


e Estimate power consumption using the XPower utility. 


e Pinpoint design bottlenecks by interpreting timing reports. 


e Apply advanced timing constraints to meet your performance objectives. 


e Improve design performance by using advanced implementation options. 


information before the next session. It’s 
easier to learn that way.” 

The predetermined schedule lets you 
lock in your dates ahead of time. If by 
chance you miss a session, the sequence is 
repeated every five weeks. You may even 
begin the series with almost any of the first 
five modules, but units 6 through 9 are 
somewhat dependent upon each other and 
should be taken sequentially. Once you 
have completed Designing for Performance 
Live Online, youl receive a Certificate of 
Completion to add to your record of con- 


tinuing education credits. 


A Classroom at Your Fingertips 

As the name promises, Designing for 
Performance Live Online is delivered by a 
live instructor — no recordings. At the 
beginning of each interactive session, the 
instructor will review labs and address any 
questions you may have before moving on 


to the lecture. 
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Real-time, synchronous training sessions 
create an environment where you can ask 
questions and hold collaborative discus- 
sions. Also, you can use the chat feature to 
ask questions of other engineers in the class, 
or pose your questions to the instructor. 

All you need to begin Designing for 
Performance Live Online are a Web-enabled 
computer and a telephone line. You will log 
on to an online server (provided by Toolwire) 
to run the lab exercises. Before enrolling in 
this class, you must pretest your system to 
ensure that it will perform the labs in the 
Toolwire virtual environment. Go to 
support.xilinx.com/support/training/using- 
toolwire.htm and follow the steps for testing 
your network, connection speed, firewall 
compatibility, and installation of the Citrix 
ICA client software. Once you have pretest- 
ed your system, you will be able to connect 
to the Toolwire remote Windows 2000 desk- 
top. If any issues arise, simply contact the 


registrar at 1-877-959-2527 for support. 


After you register for Designing for 
Performance Live Online, Xilinx will send 
you a series of e-mails containing the URL 
address and phone number for each ses- 
sion, along with lab requirements and 
instructions. Ten minutes before each ses- 
sion begins, you can log in and download 
the presentation materials and lab docu- 


mentation so you will be prepared for class. 


Conclusion 

The Designing for Performance Live 
Online series delivers a convenient, 
advanced training solution for design engi- 
taken the Xilinx 


Fundamentals course or who have equiva- 


neers who have 


lent knowledge of Virtex™-II architecture, 
software tool flow, and global timing con- 
straints. This Live Online course focuses on 
enhancing your knowledge of the latest 
Xilinx FPGA design tools and techniques. 
To learn more about Designing for 
Performance Live Online or other Xilinx 
e-learning courses, visit the Xilinx Education 
Services website at support.xilinx.com/ 
supportleducation-home.htm or call the regis- 
trar at 877-XLX-CLAS (877-959-2527). & 


Designing for Performance 
Live Online Modules* 


1. FPGA Design Techniques 
. HDL Coding Style (and lab) 


. Synthesis Techniques (and lab) 


Rr OO N 


PCGORE Genctator! system 
(and lab) 


NN 


. Xpower 
6. Achieving Timing Closure 
7. Timing Groups and Offset 


Constraints 


8. Path-Specific Timing Constraints 
(and lab) 


9. Advanced Implementation Options 


*Modules are subject to change to stay current 
with emerging technology. For the most up-to- 
date information, visit the Xilinx Education 
Services website at support.xilinx.com/support/ 
education-home.htm. 
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Xilinx Technology Enabled Instant 
Deployment ot ReatPC! Express 


Xilinx delivered the world’s first PCI Express product on the same day the specification was finalized. 


By Xilinx Statt 


Last July — on the same day the PCI 
Special Interest Group (SIG) 
announced the final PCI Express 
specification — Xilinx delivered the 
world’s first PCI Express intellectual 
property (IP) core: Real-PCI™ 
Express. The solution expedited the 
implementation of PCI Express for 
Xilinx customers by 12 to 18 months, 
demonstrating the power of program- 
mable logic over ASIC technology. 

PCI Express is the successor to 
the legacy peripheral component 
interconnect local bus standard estab- 
lished by Intel. The new PCI Express stan- 
dard is targeted at the desktop, mobile, 
server, storage, and embedded communi- 
cations markets. 

“PCI Express takes PCI to another 
level, with a high-speed, scalable, serial 
architecture that provides exciting new 
I/O options for system partitioning 
designs and form factors,” said Tony 


Pierce, PCI-SIG chairman. 


The PCI Express Core 

The Xilinx Real-PCI Express core uses 
the proven RocketIO™ 3.125 Gbps 
serial transceivers on Xilinx Virtex-II 
Pro™ FPGAs — the only devices on 
the market capable of implementing the 
new specification. 

The Real-PCI Express interface can be 
used to maximize performance and feature 
quality in high-performance workstations 
as well as consumer gaming devices. 
Designers can use the core to design high 


performance PCI Express systems using 
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Xilinx Smart-IP™ technology to meet crit- 


ical 2.5 GHz timing requirements. The 
core reaches a line speed of 2.5 Gbps, uti- 
lizing the features of the RocketIO multi- 
gigabit transceivers, such as clock data 
recovery, 8B/10B encoding, 3.125 Gbps 
SerDes, transmit/receive FIFOs, and CRC. 


The Logic Advantage 

The programmability and serial transceiv- 
er capability of Virtex-II Pro FPGAs 
allowed Xilinx to develop the core simulta- 
neously with the definition of the PCI 
Express specification as it evolved. 

The shipment of the Real-PCI Express 
core at the same time a final specification 
was released allowed designers to begin 
prototyping PCI Express solutions imme- 
diately, well ahead of any ASIC-based 
implementations, according to Cary 
Snyder, a noted industry expert and analyst. 

“By using the programmability and 
serial transceiver capability of the Virtex- 
II Pro device, Xilinx was able to develop 


its core in parallel with its participation in 





the definition of the specification — a 
true testament to the capability and 


benefits of programmable systems,” 


Snyder added. 


Always Looking Forward 

Real-PCI Express is currently compati- 
ble with both the protocol and electrical 
requirements of the vl.0 base PCI 
Express specification, but Xilinx is con- 
tinuing to participate as an active 
developer in the PCI Express Advanced 
Switching working group to develop 
the communications extension for PCI 


Express. Xilinx plans to incorporate 


PCI Express with advanced switching into 
its Virtex!™-IJ series FPGAs. 


License Price and Availability 


Real-PCI Express is 
available now as a Xilinx 
LogiCORE™ product under 
the terms of SignOnce™ IP 
license and is priced at $25,000. 
Once purchased, it may be 
configured and downloaded 
from the Xilinx website at 
www.xilinx.com/pciexpress/ 

For more information and to 
purchase Virtex-II Pro FPGAs, 
visit www.xilinx.com/platform/ 

Information about licensing and 
other Xilinx LogiCORE products is 
available on the Xilinx IP Center at 

www.xilinx.com/ipcenter/ 
For complete information 
about Real-PCI Express, visit 


www.xilinx. com/systemtio/ > 
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Xilinx Events and Tradeshows 


Xilinx participates in numerous trade shows and events throughout the year 


to help you stay informed. This is a perfect opportunity to meet our silicon 


and software experts, ask questions, see demonstrations of new products, and 


discuss the latest trends. You'll meet people just like you and you'll see how 


they are using programmable logic to solve their technical challenges. 


January 28-29 
January 30-31 
February 17-21 
February 19-20 
February 23-25 
February 25-27 
March 3-11 
March 17 

April 1-3 

April 8-11 

April 23-25 
June 16 


June 2-4 


Worldwide Event Schedule 


Platform Conference 

EDSF 2003 

3GSM 

Platform Conference 

FPGA 

Wireless Systems Conference 

IIC China Spring 

Synopsys Users Group 

Global Signal Processing Expo 
FCCM 

Embedded Systems Conference 


Embedded Processor Forum 


40th Design Automation Conference 





San Jose, CA 
Pacifico, Yokohama 
Cannes, France 
Taipei, Taiwan 
Monterey, CA 

San Jose, CA 
Shanghai, Beijing, Shenzhen 
San Jose, CA 
Dallas, TX 

Napa, CA 

San Francisco, CA 
San Jose, CA 


Anaheim, CA 


For more information about Xilinx Worldwide Events, 


please contact one of the following Xilinx team members or see our website at: 


www.xilinx.com/events/ 


For North American shows, contact Jennifer Waibel at: jennifer. waibel@xilinx.com 


For European shows, contact Andrew Stock at: andrew.stock@xilinx.com 


For Japanese shows, contact Yumi Homura at: yum. homura@xilinx.com 


For Asia Pacific shows, contact Mary Leung at: mary. leung@xilinx.com 
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Want to win a new Handspring PDA? Heard any good jokes lately? 


In each issue of the Xcell Journal we give you many golden 


nuggets of information about programmable logic. So, we thought Contest Rules 

it would be fun to bury a few “diamonds” for you to find as well 

— something to challenge you and give you a diversion from your There are nine diamonds shown 
usual routine. Some of these diamonds are buried deep and are Below, Lacy couies rau 2p 


somewhere in this issue of Xcell. Your 


difficult to dig up, some lie there on the surface, easy to find. salp ie a0 (ind the nae agheve ends 


Plus, we thought you might enjoy your colleagues’ humor. diamond originated. Add up the nine 
page numbers where you find each 

With a funny joke and a keen eye, you have a good chance of diamond, to give you a total - that’s 

winning one of five Handspring™ Visor Pro™ PDAs. your answer. Send us your answer 


along with your funniest clean joke, 
(one that is acceptable for printing). 


The five winning entries will be the 
ones with the correct answers and the 
funniest jokes (as judged by the Xcell 
editors). The five winners will receive 

a new Handspring Visor Pro PDA. 

Only one entry per person; Xilinx 
employees and contractors, and their 
families, are not eligible. The entry 
deadline is May 1, 2003. 

The winners and their jokes will be 
announced in the next issue of Xcell. 





Send your entry to 
editor@xilinx.com, with the following 
words in the subject line: 





Xcell Diamonds - 123 
(Replace 123 with your answer). 


“¢ 






In the body of your e-mail send us 
your funniest clean joke. 


Write down the page number where you find each 0 
diamond, add up the nine page numbers, Here's a joke from us to get you started: 
and send us the total along with your joke. \y There are only 10 kinds of 
You could win a new Handspring PDA! 0 engineers in the world: 


Those who understand binary numbers, 
\ and those who don't. 


f 
gg 
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Xilinx CPLD Product Selection Matrix 


PRODUCT SELECTION MATRIX 


CoolRunner-Il Family — 1.8 Volt 


(7s) [32 | 40 isnansas|isnansaal 1] as| a6 | 6 [3/1 
n 
n 
i 
sooo |) 40 isnezsaa|isnaesaa| 0/4) 6 |-6-7-0| <0 [3/1 
00 512) a sta2s83 si92s03|270|8 6 |67-10) 10 3/1 
CoolRunner XPLA3 Family — 3.3 Volt 
pro] a) 335 | 33 [36] | 5 [57-10 7-104 1 
Pisco [4 [ae | 335 | 33 [oe] | 6 |-6.7-0|7-0/4[ 1 
Faqoe [ie] 46 335 | 3a [wel | 6 [7-0] 7-10 [4] 
‘owe [ss] 4@[ 338 | aa [rea] | 75 |-7-10-12|-10-12[4 
‘oqo |aea) ae 235 | aa [azo] | 75|-7-10-12|-10-12[4 
aso |si2) 0) 335 | 33 [250 75 -7-10-12/-10-12) 
XC9500XV Family — 2.5 Volt 
a0 [36)o0] 2583 |ransaa last] 5 | 57 | 7 [3/18 
Frew |72| s0[ 2503 | raaseafre|i[s | s7 | 7 [a(n 
Fazoe [a| so 2583 | rasa fiv|2|s | s7 | 7 [3[x 
100 [208/90 259 | 1ars89 2/4) 6 |-6-7-10 7-0/3 1 
XC9500XL Family — 3.3 Volt 
a0 [36/90] 25a | 2533 [36] | 5 57-10] 7-10 (3/18 
reco | 2[90| 25035 | 2582 |r| | 5 |5.7-0|7-0[3[ 1 
jane o/s | aass 5 [sn 7-0/3 


6400 | 288) 90 | 2.5/3.3/5 


PLCC Packages (PC) 
44 

PQFP Packages (PQ) 
208 

VQFP Packages (VQ) 
44 

64 

ee) 

TQFP Packages (TQ) 
ee) 

144 










CooiRunner-tl 


PACKAGE OPTIONS AND USER I/O 


m2 )33] 7 | fac] || | ee 
nD Rn SRC: 


i EAE BMEM E pe Edy 
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*JTAG pins and port enable are not pin compatible 
in this package for this member of the family. 


Important: Verify all Data with Device 


Data Sheet and Product Availability with 
your local Xilinx Rep 


Automotive products are highlighted: 
-40C to +125C ambient temperature for CPLDs 


zea Xilinx IQ Solutions for 
aU Automotive Intelligence 
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Chip Scale Packages (CP) — wire-bond chip-scale BGA (0.5 mm ball spacing) 
z pee ite 
132 
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ais 
abet CCC 

Chip Scale Packages (CS) — wire-bond chip-scale BGA (0.8 mm ball spacing) 

48 BEEREE BORE 26 | 38 | | 

144 Sa | fro] 7) 

280 Sa a a ERC 


Elita 192 
BGA Packages (BG) — wire-bond standard BGA (1.27 mm ball spacing) 


2 EEREEREE EREEEE EERE BEE 


FGA Packages (FT) — wire-bond fine-pitch thin BGA (1.0 mm ball spacing) 


256 a a a 


FBGA Packages (FG) — wire-bond Fineline BGA (1.0 mm ball spacing) 
[fe TT fe 
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See a or eis ae el te gs 


ilinx Virtex FPGA Product Selection Matrix 
Yo ww Yo 






VO's 204 | 348 | 396 | 564) 692) 804) 852 | 996 |1164/1200 

Chip Scale Packages — wire-bond chip-scale BGA (0.8 mm ball spacing) 
————=s=— 

BGA Packages (BG) — wire-bond standard BGA (1.27 mm ball spacing) 

575 328 | 392} 408 

728 516 

FGA Packages (FG) — wire-bond fine-pitch BGA (1.0 mm ball spacing) 

256 140 | 140 SOnMZON 2 Neale 2 ali? 

456 156[2ag|2a8] | | | | | | 

676 

FFA Packages (FF) — flip-chip fine-pitch BGA (1.0 mm ball spacing) 
204 | 348 | 396 Note: * FF1148 and FF1696 packages support higher number of user I/O and 


Peele 56/5 evi lle eal zero RocketlO multi-gigabit transceivers 
CCC 
SS SSeS) 


CP 4010) 
FG456 
FG676 
FF672 

FF896 

FF1152 
FF1148 
FF1517 
FF1704 
FF1696 




































































= ea sila eee 





BFA Packages (BF) — flip-chip fine-pitch BGA (1.27 mm ball spacing) 
957 co a Ea eee | | | | | |624}684| 684] 68a 





Note: Within the same family, all devices in a particular package are pin-out (footprint) compatible. 
Virtex-Il packages FG456 and FG676 are also footprint compatible. 
Virtex-Il packages FF896 and FF1152 are also footprint compatible. 
* The FF1148 and FF1696 packages support higher number of user I/O and zero RocketlO™ multi-gigabit transceivers. 
Important: Verify all Data with Device Data Sheet (http://www.xilinx.com/partinfo/databook.htm) 


Numbers indicated in the matrix are the maximum number of user I/O's for that package and device combination, I/Os for RocketlO MGTs 
are not included in this table. 





CLB Resources Memory Resources DSP Clock Resources 1/O Features 


















Virtex-Il Pro Family — 1.5 Volt .13um Nine Layer Copper Process 


LDF-25, VDS-25, LVDSEXT., [13iM| 4 [0 | 














































BIVDS-25, UIDS-25, IVPECL-25, | -5 -6 -7 
LVCMOS25, IVCMOS18, | -5 -6 -7 449M] 8 [1 | 
| * | 56x46 | 9,280 | 20,880 | 18,560 | 290 | 88 | 1,584 | 88 | 24/420 | 8 | YES | 276 | 564 | IvCMOSI5,PCi33,LVTTI, |_-5-6-7 Pa Ps ee 
| * | 80x46 | 13,696 | 30,816 | 27,392 | 428 | 136 | 2,448 | 136 | 24/420 | 8 | YES | 372 | 644 | IVCM0S33,PCLX,PCI66, GTL, |_-5 -6-7 S/o /1136M) 8 |2) v 
=] sexs [19392 | 43,632 | 38,784 | 606 | 192 | 3,456 | 192 | 2020 | 8 | YES | 396 | 804 | GTls,HSTLI(ISVIA, | 5-6-7 a: v 
| * | 88x70 | 23,616 | 53,136 | 47,232 | 738 | 232 | 4176 | 232 | 24/20 | 8 | ves | 420 | 852 | HOIN(1.5¥1.8¥, aa! Vv 
| * | 10482 | 33,088 | 74,448 | 66,176 | 1,034 | 328 | 5,904 | 328 | 24/420 | 8 | ves | 492 | 996 | — HSILIN(I.5Y1.8V), 5-6-7 Vv 
m 1,164 | HSTLIV(1.5W1.8¥), STL, | -5 -6 -7 Vv 
S | 136x106 | 55,616 | 125,136 | 111,232 | 1,738 | 556 | 10,008 | 556 | 24420 | 12) ves | 644 | 1,200 | SsnayssmieLssiien | 5-67 |-5-6 | | /az7emiman)4 | v 
| Virtex-Il Family — 1.5 Volt .15um Eight Layer Metal Process 
= | 40k | 8x8 | 256 | 57% | 5172 | 8 | 4 | 72 | 4 | 24420 | 4 | ves | 44 | 88 | ipr25,tvpeci-33, |_-4-5-6 | 04m] | 
ie | 80K | 6x8 | 512 | 1,152 | 1,024 | 16 | 8 | 144 | 8 | 24/20 | 4 | ves | 60 | 120 | typs-33,1vps-25, |_-4-5-6 | oom] |_| 
i | 250K | 24x16 | 1536 | 3,456 | 3,072 | 48 | 24 | 432 | 24 | 24/420 | 8 | YES | 100 | 200 |IVDSEXT-33, LVDSEXT-25,| -4-5 -6 | tm | 
| 500K | 32x24 | 3,072 | 6912 | 6144 | 96 | 32 | 576 | 32 | 24/420 | 8 | ves | 132 | 264 | BLVDS-25,ULVDS-25, | -4-5-6 2amM{ || 
| iM | 40x32 | 5,120 | 11,520 | 10,240 | 160 | 40 | 720 | 40 | 24/420 | 8 | ves | 216 | 432 | LVTTLLvCMoOs33, | -4-5-6 = 4iM{ | 
15M | 48x40 | 7,680 | 17,280 | 15,360 | 240 | 48 | 864 | 48 | 24/420 | 8 | YES | 264 | 528 | LVCMOS25, \VCMOS18, | -4-5 -6 =| 2) srt | ioe | 
| am | 56x48 | 10,752 | 24,192 | 21,504 | 336 | 56 | 1,008 | 56 | 24/420 | 8 | YES | 312 | 624 |LVCMOS15, PCI33, PCI66,| -4-5 -6 see ee 
| 3m_| 64x56 | 14,336 | 32,256 | 28,672 | 448 | 96 | 1,728 | 96 | 24/420 | 12 | YES | 360 | 720 |PCI-X,GTL,GTL+,HSTLI,| -4-5-6 105M; | | w 
912__| HSTL I, HSTL Il, HSTLIV, | -4 -5 -6 2 
1,104 | SSTL2I, SSTL2I, SSTL3 1, | -4-5 -6 219m] sdf 
8M 112x104 | 46,592 | 104,832 | 93,184 | 1,456 | 168 | 3,024 | 168 24/420 2 YES | 554 | 1,108 | SSTL3 II, AGP, AGP-2X 4-5 29.1M Vv 


Note: 1. System Gates include 20-30% of CLBs used as RAM 
2. Logic cell = (1) 4 Input (LUT) Look Up Table + Flip Flop + Carry Logic. ee WV TERT EE XII 
3. DCM — Digital Clock Management BAsvaite | EASYPATH 
4. Virtex-Il Series EasyPath solution available to provide a no risk, no effort cost reduction path for volume production. Pt 
* System gate count not meaningful for Virtex-Il Pro devices with immersed special blocks such as PowerPC processors and multi-gigabit transceivers. 
** The FF1148 and FF1696 packages support higher number of user I/O and zero RocketlO multi-gigabit transceivers. 
Important: Verify all Data with Device Data Sheet (http://www.xilinx.com/partinfo/databook.htm) 
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Xilinx Spartan FPGAs 


PRODUCT SELECTION MATRIX 


CLB Resources BLK RAM CLK Resources 1/O Features 






Spartan-IIE Family — 1.8 Volt .18/.15um Six Layer Metal Process 




















































| 50K [| 16x24 | 768 | 1,728 | 1,536 | 24k | 8 | 32k | NA | 25/320 | 4 | ves | Yes | NA | 83 | 182 | — LTILIVCMOS2, 2 ee 
| 100k | 20x30 | 1,200 | 2700 | 2400 | 37k | 10 | 40k | NA | 25/320 | 4 | Yes | YES | NA | 86 | 202 | LvCMOst8,PC33,PcIé6, | -6-7 | 6 0.9M 
265 | GTLGTL+,HSTLLHSTLM, | 6-7 | -6 1AM 
289_| HSTLNSSTI3L,SSTI3I, |_-6-7_ | -6 | o|& | 14M 
SSM12,STL2UAGP-2X, | -6-7 | 6 | | | 19M 
410_| CTLLVDS,BLVDSLPECL | -6-7 | 6 | 2.7M 
Spartan-II Family — 2.5 Volt .22/.18um Six Layer Metal Process 
| 15K | 8x12 | 192 | 432_ | 384_ | 6K | 4 | 16K _| NA | 25/200 | 4 | Ves | YES | NA | NA | 86 |  LVTTL LVCMOS2, | _-5- 0.2M 
| 30K [| t2xie | 432 | 972 | 864 | 135K | 6 | 24k | NA | 25/200 | 4 | Yes | YES | NA | NA | 132_|  PCI33 (3.3V & 5V), 0.40 
50K | 16x24 | 768 | 1,728 | 1,536 | 24k | 8 | 32K | NA | 25/200 | 4 | YES | YES | NA | NA | 176 | PCI66 (3.3V), GTL, GTLt, a. ja. |_0.6M 
196_| HSTL |, HSTL II, HSTL IV, =)5 | 08M 
260 _| SSTL3 |, SSTL3 Il, SSTL2 | 1AM 
284 | SSTL2 I, AGP-2X, CT aM 
Spartan-XL Family — 3.3 Volt 
TTL, LVTTL, CMOS, ES 0.05M 
LVMOS, PCI 4-5 0.09M 
20K Aes B | 5 [018M 
Ae 0.25M 
40K 28 x 28 784 1,862 1,568 24.5K | NA NA NA NA NA} NA | NA | NA | NA | 224 4-5 -4 0.33M 


PACKAGE OPTIONS AND USER I/O 


Note: 1. System Gates include 20-30% of CLBs used as RAM 
2. Logic Cell is defined as a 4 input LUT and a register 


Important: Verify all Data with Device Data Sheet 
(http://www.xilinx.com/spartan) 





77 1112 | 1601192 | 224 Numbers indicated in the matrix are the maximum number of user I/O's for 
that package and device combination. 
nae ef 61) | aa 


Automotive products are highlighted: 4 a Xilinx IQ Solutions for 
240 192} 192  -40C to +125C junction temperature for FPGAs =I’ ee 
VQFP Packages (VQ) 


100 PTT yy coleo) |] ] mm 77 [77 


TQFP Packages (TQ) 


144 no | | Llc |l Pe (112 (BAI) 


Chip Scale Packages — wire-bond chip-scale BGA (0.8 mm ball spacing) 


144 86 | 92 He? 
280 192 | 224 
cea is ETE IETS) 


PLCC Packages 











PQFP Packages (PQ) 
208 





FGA Packages (FT) — wire-bond fine-pitch thin BGA (1.0 mm ball spacing) 
256 182 | 

FGA Packages (FG) — wire-bond fine-pitch BGA (1.0 mm ball spacing) 

hey) 

456 

676 

BGA Packages 


256 ate deet ates ee irs P| | | 192 
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Xilinx 1Q Solutions 






ee Runner CoolRunner-t 





Part Number 


Speed Package Voltage | Description 


XC9536XL 


10 ns/100 MHz VQ44, VQ64 36 Macrocells (800 Gates), ISP. JTAG, Bus Hold 
& I/P Hysteresis 

10 ns/100 MHz VQ64, TQ100 72 Macrocells (1,600 Gates), ISP JTAG, Bus Hold 
& I/P Hysteresis 


XC9572XL 


XCR3032XL 10 ns/100 MHz VQ44 32 Macrocells (800 Gates), Low Power, 
Slew Rate Control, ISP & JTAG 

XCR3064XL 10 ns/100 MHz VQ44, VQ100 64 Macrocells (1,600 Gates), Low Power, 
Slew Rate Control, ISP & JTAG 

XCR3128XL 10 ns/100 MHz VQ100, TQ144 128 Macrocells (3,200 Gates), Low Power, 
Slew Rate Control, ISP & JTAG 

XCR3256XL 10 ns/100 MHz TQ144, PQ208 256 Macrocells (6,400 Gates), Low Power, 
Slew Rate Control, ISP & JTAG 

XCR3384XL 10 ns/100 MHz PQ208 384 Macrocells (9,600 Gates), Low Power, 
Slew Rate Control, ISP & JTAG 

XCR3512XL 10 ns/100 MHz PQ208 512 Macrocells (12,800 Gates), Low Power, 
Slew Rate Control, ISP & JTAG 

XC2C32 6 ns/145 MHz a 32 Macrocells (800 Gates), 6 I/O Standards, 
Slew Rate Control, Clock Doubler, Bus Hold, 
I/P Hysteresis. Ultra low power. 

XC2C64 7.5 ns/127 MHz VQ44, VQ100 64 Macrocells (1,600 Gates), 6 I/O Standards, 
Slew Rate Control, Clock Doubler, Bus Hold, 
I/P Hysteresis. Ultra low power. 

XC2C128 7.5 ns/127 MHz VQ44,VQ100 128 Macrocells (3,200 Gates), 9 I/O Standards, 
Slew Rate Control, Clock Doubler, Clcok Divider, 
CoolClock, DataGate, Bus Hold, I/P Hysteresis. 
Ultra low power. 

XC2C256 7.5ns/127 MHz | VQ100, TQ144 256 Macrocells (6,400 Gates), 9 I/O Standards, 
Slew Rate Control, Clock Doubler, Clcok Divider, 
CoolClock, DataGate, Bus Hold, I/P Hysteresis. 
Ultra low power. 

XC2C384 10 ns/100 MHz TQ144, PQ208 384 Macrocells (9,600 Gates), 9 I/O Standards, 
Slew Rate Control, Clock Doubler, Clcok Divider, 
CoolClock, DataGate, Bus Hold, |/P Hysteresis. 
Ultra low power. 

XC2C512 10 ns/100 MHz PQ208 512 Macrocells (12,800 Gates), 9 I/O Standards, 


Slew Rate Control, Clock Doubler, Clcok Divider, 
CoolClock, DataGate, Bus Hold, I/P Hysteresis. 
Ultra low power. 


Note: See page 96 for CPLD IQ devices Package Options and User I/O. 
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SPARTAN-XL 


SPARTAN-II 





Wy Ww 


Part Number | Speed Grade | Package 











XCSO5XL 
XCS1OXL 
XCS20XL 
XCS30XL 
XCS40XL 


PQ208, BG256 











XC2S15 TQ144 
XC2S30 TQ144, PQ208 
XC2S50 TQ144, PQ208, 

FG256 
XC2S100 TQ144, PQ208, 

FG256 
XC2S150 PQ208, FG256 
XC2S200 


PQ208, FG456 








XC2S50E TQ144, PQ208, 
FT256 
XC2S100E TQ144, PQ208, 
FT256 
XC2S150E PQ208, FT256 
XC2S200E PQ208, FT256 
XC2S300E PQ208, FG456 
XC2S400E FT256, FG456, 
FG676 
XC2S600E FG456, FG676 


1 1 1 1 1 1 aS iS a & is 











SPARTANZIIIE 





Sv 





Voltage | Description 





2.5V 





2.5V 





2.5V 


2.5V 







1.8V 


1.8V 


1.8V 


1.8V 


1.8V 









Low cost FPGA with power down pin, 5V tol 1/0, 
5,000 Gate, 238 logic cells, 100 CLBs. 


Low cost FPGA with power down pin, 5V tol 1/0, 
10,000 Gate, 466 logic cells, 196 CLBs. 


Low cost FPGA with power down pin, 5V tol 1/0, 
20,000 Gate, 950 logic cells, 400 CLBs. 


Low cost FPGA with power down pin, 5V tol 1/0, 
30,000 Gate, 1,368 logic cells, 576 CLBs. 


Low cost FPGA with power down pin, 5V tol 1/0, 
40,000 Gate, 1,862 logic cells, 784 CLBs. 





High volume FPGA, on-chip RAM, 16 1/0 
standards, 15,000 Gate, 432 logic cells, 
96 CLBs, 4 block RAM blocks, 4 DLLS. 


High volume FPGA, on-chip RAM, 16 1/0 
standards, 30,000 Gate, 972 logic cells, 
216 CLBs, 6 block RAM blocks, 4 DLLS. 


High volume FPGA, on-chip RAM, 16 1/0 
standards, 50,000 Gate, 1,728 logic cells, 
384 CLBs, 8 block RAM blocks, 4 DLLs. 


High volume FPGA, on-chip RAM, 16 I/O 
standards, 100,000 Gate, 2,700 logic cells, 
600 CLBs, 10 block RAM blocks, 4 DLLs. 


High volume FPGA, on-chip RAM, 16 1/0 
standards, 150,000 Gate, 3,888 logic cells, 
864 CLBs, 12 block RAM blocks, 4 DLLs. 


High volume FPGA, on-chip RAM, 16 1/0 
standards, 200,000 Gate, 5,292 logic cells, 
1,176 CLBs, 14 block RAM blocks, 4 DLLs. 


High volume FPGA, on-chip RAM, 19 1/0 
standards, 50,000 Gate, 1,728 logic cells, 
384 CLBs, 8 block RAM blocks, 4 DLLs. 


High volume FPGA, on-chip RAM, 19 I/O 
standards, 100,000 Gate, 2,700 logic cells, 
600 CLBs, 10 block RAM blocks, 4 DLLs. 


High volume FPGA, on-chip RAM, 19 1/0 
standards, 150,000 Gate, 3,888 logic cells, 
864 CLBs, 12 block RAM blocks , 4 DLLs. 


High volume FPGA, on-chip RAM, 19 1/0 
standards, 200,000 Gate, 5,292 logic cells, 
1,176 CLBs, 14 block RAM blocks, 4 DLLs. 


High volume FPGA, on-chip RAM, 19 I/O 
standards, 300,000 Gate, 6,912 logic cells, 
1,536 CLBs, 16 block RAM blocks, 4 DLLs. 


High volume FPGA, on-chip RAM,19 1/0 
standards, 400,000 Gate, 10,800 logic cells, 
2,400 CLBs, 40 block RAM blocks, 4DLLs. 


High volume FPGA, on-chip RAM,19 1/0 
standards, 600,000 Gate, 10,800 logic cells, 
3,456 CLBs, 72block RAM blocks, 4DLLs. 


Note: See page 93 for Spartan IQ devices Package Options and User I/O. 
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Xilinx Contiguration Storage Solutions 





Min board space 


(@) 
= 
a Compression 





25 


> 
= 
”n 
= 
@ 
fan) 
> 
tes 
S) 
= 
@ 
= 


16 Mbit 


SystemACE CF up to 8 Gbit 


SystemACE MPM 1)-42.25 em? |) Yes 
32 Mbit 


64 Mbit 


a Number of Components 





16 Mbit | 3 Yes 
32 Mbit 


64 Mbit 


SystemACE SC Custom 


In-System Programming (ISP) Configuration PROMs 
asec | | viv] fy aaviy|y 
a a EE 
me | | ly iy [fy aviv [y 
fr ee EE OO 
awe Te av fy 


One-Time Programmable (OTP) Configuration PROMs 


Posey | iv iy iy. | [3aviy |y 
wep fy fav fy fy 
Dawe aay fy 
re 


TS Wstellaco te Alia 


Xilinx Home Page 
http://www.xilinx.com 


Xilinx Online Support 
http://www.xilinx.com/support/support.htm 


Xilinx IP Center 
http://www.xilinx.com/ipcenter/index.htm 
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JTAG 


SelectMAP (up to 4 FPGA) 











ultiple Designs 





Koya eea'(o) Fahl (Wm VU =ve Fs) 
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(©)] 
(ye) 
Se 
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o 
> 
(oe) 
5 
= cc 

Unlimited | Yes | Yes | Yes 


; 
N 


Up to 8 





30 Mbit/sec 


CompactFlash 


152 Mbit/sec | AMD Flash Memory 
Slave-Serial (up to 8 FPGA chains) 


FF Max Config. Speed 


SelectMAP (up to 4 FPGA) o | No | Yes | 152 Mbit/sec | AMD Flash memory 


Slave-Serial (up to 8 FPGA chains) 


-xC17S300A | 


XC17S200A | Y 


xcrassoxt|¥ || | | | | 
Y r 


XC17S40XL 


Xilinx Education Center 
http://www.xilinx.com/support/education-home.htm 


Xilinx Tutorial Center 
http://www.xilinx.com/support/techsup/tutorials/index.htm 


Xilinx WebPACK 
http://www.xilinx.com/sxpresso/webpack.htm 
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Xilinx Software 











Feature ISE WebPACK ISE BaseX ISE Foundation ISE Alliance 


Virtex™ Series Virtex-E: V50E —V300E Virtex: V50 — V300 ALL ALL 
Virtex-Il: 2V40 — 2V250 Virtex-E: V50E —V300E 


Virtex-Il Pro: 2VP2 Virtex-Il: 2V40 — 2V250 
Virtex-Il Pro: 2VP2 















































































































Spartan” IV/IIE Families ALL (except XC2S400E and XC2S600E) | ALLS ALL ALL 
CoolRunner™ XPLA3 / CoolRunner! EE ee eee eee ALL 
XC9500™ Series ALL LAL ALL ALL 
Educational Services EE eS eee aaa eee eee Yes 
Design Services Sold as an Option 
Support Services Yes 
Schematic Editor No 
HDL Editor Yes 
State Diagram Editor No 
CORE Generator System Yes Yes Yes 
PACE (Pinout and Area Constraint Editor) Yes Yes Yes Yes 
Architecture Wizards No Yes Yes Yes 

DCM — Digital Clock Management —— 

MGT — Multi-Gigabit Transcievers 
3rd Party RTL Checker Support Yes 
Xilinx System Generator for DSP Sold as an Option 
GNU Embedded Tools Yes 

GCC — GNU Compiler 

GDB — GNU Software Debugger 
WindRiver Xilinx Edition Development Tools Sold as an Option Sold as an Option Sold as an Option 

Diab C/C++ Compiler 

SingleStep Debugger 

visionPROBE II target connection 
Xilinx Synthesis Technology (XST) No 
Synplicity Synplify/Pro Integrated Interface (PC Only) 
Synplicity Amplify Physical Synthesis Support Yes 
Leonardo Spectrum Integrated Interface 
Synopsys FPGA Compiler II EDIF Interface 

No 

iMPACT Yes Yes 
FloorPlanner Yes Yes Yes Yes 
Xilinx Constraints Editor Yes Yes 
Timing Driven Place & Route Yes Yes Yes Yes 
System ACE Configuration Manager Yes Yes 
Modular Design Yes Yes 
Timing Improvement Wizard Yes Yes 
IBIS Models Yes Yes 
STAMP Models Yes 
LMG SmartModels Yes (Available from Synopsys) 
HSPICE Models* Yes 
HDL Bencher”™ 2 No 
ModelSim® Xilinx Edition (MXE I!) Modelsim XE II Starter** 
Static Timing Analyzer Eg 7 Yes 
ChipScope PRO — No Sodas an Option | Sold as an Option Sold as an Option 
FPGA Editor with Probe SE EE a Saas Yes Yes 
ChipViewer Yes Yes Yes Yes 
XPower (Power Analysis) Yes Yes 
3rd Party Equivalency Checking Support Yes Yes 
SMARTModels for PPC and Rocket 1/0 Yes Yes 





Yes Yes 


PC (MS Windows 2000/MS Windows XP)} PC Only (MS Windows 2000/MIS Windows XP) | PC (MS Windows 2000/MS Windows XP)Sun Solaris, Linux | PC (MS Windows 2000/MS Windows XP)Sun Solaris, Linux 


For more information on the complete list of Xilinx IP products, visit the Xilinx IP Center at http://www.xilinx.com/ipcenter 


3rd Party Simulator Support 





* HSPICE Models available at the Xilinx Design Tools Center at www.xilinx.com/ise. 
* MXE II supports the simulation of designs up to 1 million system gates and is sold as an option. For more information, visit the Xilinx Design Tools Center at www.xilinx.com/ise 
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Xilinx Global Services 


support.xilinx.com 


e Sign up for personalized email 
alerts @mysupport.xilinx.com 


e Search our knowledge database 
e Troubleshoot with Problem Solvers 


e Consult with engineers in Forums 


Xilinx Productivity Advantage Program 
(XPA) 


The XPA provides customers with everything they need to 
improve their designs — Software, Education and Support 
Services, IP cores and demo boards — in one package. 

A single purchase order allows their designers to get what 
they need, when they need it, without the worry of ordering 
each piece of the solution separately. 


Two types of XPAs are available; custom XPA and 
Prepacked “XPA Seat”. “XPA Seat” is primarily for 
individuals or small design teams. 


http://support.xilinx.com/support/gsd/xpa_program.htm 


Xilinx Design Services (XDS) 


e XDS provides extensive FPGA hardware and 
embedded software design experience backed 
by industry recognized experts and resources to 
solve even the most complex design challenge. 


e System Architecture Consulting — Provide 
engineering services to define system architec- 
ture and partitioning for design specification. 


¢ Custom Design Solutions — Project designed, 
verified, and delivered to mutually agreed upon 
design specifications. 


e IP Core Development, Optimization, 
Integration, Modification, and Verification — 
Modify, integrate, and optimize customer 
intellectual property or third party cores to work 
with Xilinx technology. Develop customer-required 
special features to Xilinx IP cores or third party 
cores. Perform integration, optimization, and 
verification of IP cores in Xilinx technology. 


e Embedded Software — Develop complex 
embedded software with real-time constraints, 
using hardware/software co-design techniques. 


e Conversions — Convert ASIC designs and other 
FPGAs to Xilinx technology and devices. 


Education Services Contacts 


North America: 877-XLX-CLAS (877-959-2527) 
http://support.xilinx.com/support/training/training.htm 


Europe: +44-870-7350-548 
eurotraining@xilinx.com 


Japan: +81-03-5321-7750 
http://support.xilinx.co.jp/support/education-home.htm 


Asia Pacific: +03-5321-7711 
http://support.xilinx.com/support/education-home.htm 
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Xilinx Global Services 


Part Number 

Education Services 
FPGA13000-5-ILT 
FPGA23000-5-ILT 
FPGA33000-5-ILT 
LANG11000-5-ILT 
LANG21000-5-ILT 
LANG12000-5-ILT 
PCI8000-4-ILT 
PCI28000-4-ILT 
ASIC25000-5-ILT 
DSP2000-3-ILT 
DSP-10000-5-ILT 
RIO22000-5-ILT 
PROMO-5004-5-ILT 
PROMO-5003-5-LEL 
EMBD-21000-5-ILT 

Platinum Technical Service 
SC-PLAT-SVC-10 
SC-PLAT-SITE-50 
SC-PLAT-SITE-100 
SC-PLAT-SITE-150 

Titanium Technical Service 
PS-TEC-SERV | 
Design Services 

DE-DES-SERV | 
Xilinx Productivity Advantage 
DS-XPA 

DS-ISE-ALI-XPA 
DS-ISE-FND-XPA 


Titanium Technical Service 


e Dedicated application engineer 


e Design Flow methodology coaching e 


e Contract based service 
e Factory escalation process 


e Timing Closure expertise 


e Service at customer site or Xilinx site e 


Design Services Contacts 


North America & Asia: 
Richard Fodor: 408-626-4256 
Mike Barone: 512-238-1473 


Europe: 
Alex Hillier: +44-870-7350-516 
Martina Finnerty: +353-1-4032469 


designservices@xilinx.com 


Product Description 


Fundamentals of FPGA Design 


PCI CORE Basics 


Embedded Systems ae 


1 Seat Platinum Technical Service w/10 education credits 


Titanium Technical Service (minimum 40 hours) 


Design Services Contract 


Custom XPA Packaged Solution 


XPA Seat, ISE Foundation 


Platinum Technical Service 


e Access to Senior Applications Engineers 


Dedicated Toll Free Number* 
e Priority Case Resolution 

e Ten Education Credits 

e Electronic Newsletter 
Formal Escalation Process 


e Service Packs and Software Updates 


e Application Engineer to Customer Ratio, 2x Gold Level 


*Toll free number available in US only, dedicated local numbers 


available across Europe 


XPA Contacts 


North America: 800-888-FPGA (3742) 
fpga.xilinx.com 


Europe: 
Stuart Elston: +44-870-7350-532 


Europe: 


Japan: 


North America: 
Telesales: 1-800-888-3742 


Duration 
hours 


8 


reconcsas SSC*«dC 
a 


Platinum Technical Service site license up to 50 customers Ts 





Platinum Technical Service site license for 51-100 customers = 


Platinum Technical Service site license for 101-150 customers 


NIA 
NA 


N/A 





XPA Seat, ISE Alliance N/A 


N/A 


S1=3-5921-7730 


or japantitanium@xilinx.com 


PNEVIEL Tia 





Now 
Now 
Now 
Now 
Now 
Now 


Now 


Zz 


OW 


PE 


OW 


PzE 


ow 
Now 
Now 
Now 
Now 


Now 


Now 
Now 
Now 


Now 


Now 


Now 


Now 


Now 


Now 


Titanium Technical Service Contacts 


Stuart Elston: +44-870-7350-632 
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—Jittmals ‘ 
————= Proven IP 





Eureka Technology offers a wide range of silicon proven IP cores 

for SoC designs. By using our free web-based SoCDesigner™ software, our Eureka Tech nology 

customers are able to combine the benefits of using pre-verified silicon proven p 

IP cores with the ability of customizing the SoC design according to their 

exact specifications. Our rich repertoire of IP cores enable our customers to . 
a . Call today for more information 

build a wide varieties of system controller/CPU companion chips to support rel: +1 650 960 3800 

different CPU and bus standards such as ARM™, PowerPC™, ARC™, sanailedene sO ancalatcen-coan 

MIPS™, SH™2/3/4, PCI™, Cardbus™ and PCMCIA™. ee ee 


4962 El Camino Real, Los Altos, 


To | bout ili EP d SoCDesigner, pl isit 
O lear more about Our Silicon proven cores af O esigner, please VIS1 CA 94022 USA 


www.eurekatech.com/socdesigner 


© 2003 Eureka Technology Inc. All rights reserved. The Eureka Technology logo and SoCDesigner are trademarks of Eureka Technology Inc. All other trademarks are properties of their respective owners. 


Got-a-Lotta-Pins. 
Not-a-Lotta-Price. 








With up to 514 I/Os, the newly extended Fits your budget. Hits your market. 
Spartan’-IE family offers the lowest cost per pin Spartan-IITE FPGAs lead the way in quick-turn, cost-sensitive 
in the industry. That’s why Spartan-IIE FPGAs markets like consumer, digital video, home networking, auto- 
= are the first choice of designers seeking a low- motive and much more. Supporting 19 I/O 
cost, higher density solution for high-volume applications. standards and driven by Xilinx’s proven, 





; ; lightning-fast ISE 5.11 software, there’s no 
All the I/O. All the density. No compromises. . 
better low-cost solution 


Spartan-IIE FPGAs achieve minimum die size with- SP ARTANGZIMIE shipping today 


out sacrificing I/Os. With double the competitive 


a 


I/O count, and densities ranging from 50,000 Visit www.xilinx.com/spartan2e/ today 


to 600,000 system gates, the Spartan-IJE series is and find out how you can get all the pins 





the only true solution to ASIC headaches. you need ... at the price you want. 


aw 
>~ XILINX’ 


The Programmable Logic Company™ 
rt 
wuvw.xilinx.com/spartan2e/ 
FORTUNE 2003 
100 BEST COMPANIES TO WORK FOR 
® 2003 Xilinx, Inc., 2100 Logic Drive, San Jose, CA 95124. Europe +44-870-7350-600; Japan +81-3-5321-7711; Asia Pacific +852-2-424-5200; Xilinx and Spartan are registered trademarks, WebPACK ia a trademark and The Programmable Logic Company is a service mark of Xilinx, Inc. 
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